07070100000000000081A4000003E800000064000000015BE06E030000003E000000000000000000000000000000000000003B00000000python-swiftlm-8.0+git.1541434883.e0ebe69/.copyrightignorerequirements.txt MANIFEST.in .gitignore tests/rings/samples/* 07070100000001000081A4000003E800000064000000015BE06E0300000076000000000000000000000000000000000000003500000000python-swiftlm-8.0+git.1541434883.e0ebe69/.gitreview[gerrit] host=gerrit.suse.provo.cloud port=29418 project=ardana/swiftlm.git defaultremote=ardana defaultbranch=master 07070100000002000081A4000003E800000064000000015BE06E030000279F000000000000000000000000000000000000003200000000python-swiftlm-8.0+git.1541434883.e0ebe69/LICENSE Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 07070100000003000081A4000003E800000064000000015BE06E03000000B1000000000000000000000000000000000000003600000000python-swiftlm-8.0+git.1541434883.e0ebe69/MANIFEST.inexclude .gitignore exclude .gitreview recursive-include doc * include requirements.txt include test-requirements.txt include test_*py include LICENSE recursive-include tests * 07070100000004000081A4000003E800000064000000015BE06E0300000367000000000000000000000000000000000000003400000000python-swiftlm-8.0+git.1541434883.e0ebe69/README.md (c) Copyright 2015 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. # Swiftlm - Swift Lifecycle Managment XXX ## Docs XXX ## Code organization * bin/: Executable scripts that are the processes run by the deployer * doc/: Documentation * etc/: Sample config files * swiftlm/: Core code * tests/: Unit and functional tests 07070100000005000081A4000003E800000064000000015BE06E03000004AA000000000000000000000000000000000000003200000000python-swiftlm-8.0+git.1541434883.e0ebe69/TODO.md1. Too much mocking in tests, many of which arent used or are just repetative. 2. Tests used seperate data files. If they really require them they should exist as constants in each test file. 3. Invalid Constant names in the core code. 4. Inconsistent return values from each checks `main()`. 5. Seems to be no logging in core code. 6. Seperate Constant files in core code. These should be places into config files rather than the code itself. 7. Utility commands spread across multiple files and and many in class methods when functions would suit better. 8. Utility functions shelling out for text processing using `grep`, `awk`, and `sed` 9. Unneeded Message class in some check files. 10. Checks should return python structures. The `if __name__ == '__main__':` block can format them for printing if called as a single script. ## Nits 11. Backslash continuation rather than brackets. 12. Old style text formating, `%s` instead of `{}` 13. Using old style Octal formatting `0777` instead of `0o777` 14. Using ast.literal_eval, Hopefully this can be removed. 15. Add Util function for Popen if we are going to be usign it so much. 99% of the time we want PIPE stdout+stdin 07070100000006000041ED000003E800000064000000055BE06E0300000000000000000000000000000000000000000000003400000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci07070100000007000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000003C00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project07070100000008000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000004800000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model07070100000009000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000004D00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data0707010000000A000081A4000003E800000064000000015BE06E03000008B1000000000000000000000000000000000000005F00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data/control_plane.yml# # (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 control-planes: - name: ccp control-plane-prefix: ccp region-name: region1 failure-zones: - AZ1 - AZ2 - AZ3 common-service-components: - lifecycle-manager-target - monasca-agent clusters: - name: cluster1 cluster-prefix: c1 server-role: - SERVER1-ROLE member-count: 1 allocation-policy: strict service-components: - lifecycle-manager - openstack-client - ntp-server - ntp-client - kafka - zookeeper - vertica - monasca-api - monasca-persister - ntp-server - swift-client - memcached - mysql - rabbitmq - keystone-api - keystone-client - ip-cluster - tempest - name: cluster2 cluster-prefix: c2 server-role: - SWPAC-ROLE member-count: 3 allocation-policy: strict service-components: - ntp-client - swift-proxy - swift-account - swift-container - swift-ring-builder - swift-client resources: - name: resource1 resource-prefix: r1 server-role: - SWOBJ-ROLE allocation-policy: strict min-count: 3 service-components: - ntp-client - swift-object 0707010000000B000081A4000003E800000064000000015BE06E0300000F4C000000000000000000000000000000000000005700000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data/disks.yml# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 disk-models: - name: DISKS volume-groups: # The policy is not to consume 100% of the space of each volume group. # 5% should be left free for snapshots and to allow for some flexibility. # sda_root is a templated value to align with whatever partition is really used # This value is checked in os config and replaced by the partition actually used # on sda e.g. sda1 or sda5 - name: ardana-vg physical-volumes: - /dev/sda_root logical-volumes: - name: root size: 75% fstype: ext4 mount: / - name: log size: 15% mount: /var/log fstype: ext4 mkfs-opts: -O large_file - name: crash size: 2% mount: /var/crash fstype: ext4 mkfs-opts: -O large_file - name: SWPAC-DISKS volume-groups: # The policy is not to consume 100% of the space of each volume group. # 5% should be left free for snapshots and to allow for some flexibility. # sda_root is a templated value to align with whatever partition is really used # This value is checked in os config and replaced by the partition actually used # on sda e.g. sda1 or sda5 - name: ardana-vg physical-volumes: - /dev/sda_root logical-volumes: - name: root size: 75% fstype: ext4 mount: / - name: log size: 15% mount: /var/log fstype: ext4 mkfs-opts: -O large_file - name: crash size: 2% mount: /var/crash fstype: ext4 mkfs-opts: -O large_file device-groups: - name: swiftpac devices: - name: /dev/sdb - name: /dev/sdc - name: /dev/sdd - name: /dev/sde - name: /dev/sdf consumer: name: swift attrs: rings: - account - container - name: SWOBJ-DISKS volume-groups: # The policy is not to consume 100% of the space of each volume group. # 5% should be left free for snapshots and to allow for some flexibility. # sda_root is a templated value to align with whatever partition is really used # This value is checked in os config and replaced by the partition actually used # on sda e.g. sda1 or sda5 - name: ardana-vg physical-volumes: - /dev/sda_root logical-volumes: - name: root size: 75% fstype: ext4 mount: / - name: log size: 15% mount: /var/log fstype: ext4 mkfs-opts: -O large_file - name: crash size: 2% mount: /var/crash fstype: ext4 mkfs-opts: -O large_file device-groups: - name: swiftobj devices: - name: /dev/sdb - name: /dev/sdc - name: /dev/sdd - name: /dev/sde - name: /dev/sdf consumer: name: swift attrs: rings: - object-0 - object-1 0707010000000C000081A4000003E800000064000000015BE06E03000003BA000000000000000000000000000000000000005E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data/server_roles.yml# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 server-roles: - name: SERVER1-ROLE interface-model: NET-INTERFACES disk-model: DISKS - name: SWPAC-ROLE interface-model: NET-INTERFACES disk-model: SWPAC-DISKS - name: SWOBJ-ROLE interface-model: NET-INTERFACES disk-model: SWOBJ-DISKS 0707010000000D000081A4000003E800000064000000015BE06E030000097B000000000000000000000000000000000000005900000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data/servers.yml# # (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 baremetal: netmask: 255.255.255.0 subnet: 192.168.110.0 server-interface: eth2 servers: - id: server1 ip-addr: 192.168.110.3 role: SERVER1-ROLE server-group: AZ1 mac-addr: a4:93:0c:4f:7c:73 nic-mapping: VAGRANT ilo-ip: 192.168.109.3 ilo-password: password ilo-user: admin - id: server2 ip-addr: 192.168.110.4 role: SWPAC-ROLE server-group: AZ1 mac-addr: b2:72:8d:ac:7c:6f nic-mapping: VAGRANT ilo-ip: 192.168.109.4 ilo-password: password ilo-user: admin - id: server3 ip-addr: 192.168.110.5 role: SWPAC-ROLE server-group: AZ2 mac-addr: 8a:8e:64:55:43:76 nic-mapping: VAGRANT ilo-ip: 192.168.109.5 ilo-password: password ilo-user: admin - id: server4 ip-addr: 192.168.110.6 role: SWPAC-ROLE server-group: AZ3 mac-addr: 8a:8e:64:55:44:77 nic-mapping: VAGRANT ilo-ip: 192.168.109.6 ilo-password: password ilo-user: admin - id: server5 ip-addr: 192.168.110.7 role: SWOBJ-ROLE server-group: AZ1 mac-addr: 8a:8e:64:55:42:79 nic-mapping: VAGRANT ilo-ip: 192.168.109.7 ilo-password: password ilo-user: admin - id: server6 ip-addr: 192.168.110.8 role: SWOBJ-ROLE server-group: AZ2 mac-addr: 8a:8e:64:55:46:75 nic-mapping: VAGRANT ilo-ip: 192.168.109.8 ilo-password: password ilo-user: admin - id: server7 ip-addr: 192.168.110.9 role: SWOBJ-ROLE server-group: AZ3 mac-addr: 8a:8e:64:57:41:82 nic-mapping: VAGRANT ilo-ip: 192.168.109.9 ilo-password: password ilo-user: admin 0707010000000E000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000005300000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data/swift0707010000000F000081A4000003E800000064000000015BE06E03000006B5000000000000000000000000000000000000005D00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/project/input-model/data/swift/rings.yml# # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 ring-specifications: - region-name: region1 swift-zones: - id: 1 server-groups: - CLOUD rings: - name: account display-name: Account Ring min-part-hours: 16 partition-power: 12 replication-policy: replica-count: 3 - name: container display-name: Container Ring min-part-hours: 16 partition-power: 12 replication-policy: replica-count: 3 - name: object-0 display-name: General default: no min-part-hours: 16 partition-power: 12 replication-policy: replica-count: 3 - name: object-1 display-name: Erasure-Code-Ring default: yes min-part-time: 16 partition-power: 12 erasure-coding-policy: ec-type: jerasure_rs_vand ec-num-data-fragments: 10 ec-num-parity-fragments: 4 ec-object-segment-size: 1048576 07070100000010000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tests07070100000011000081ED000003E800000064000000015BE06E0300000344000000000000000000000000000000000000004B00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tests/run-tempest.bash#!/bin/bash # # Copyright 2016 Hewlett Packard Enterprise Development LP # Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # set -eux set -o pipefail ansible-playbook -i hosts/verb_hosts tempest-test-resources.yml sudo -u tempest /opt/stack/service/tempest/bin/ardana-tempest.sh <<< "+tempest\.api\.object_storage" 07070100000012000081A4000003E800000064000000015BE06E0300000A24000000000000000000000000000000000000004900000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tests/test-plan.yaml# # Copyright 2016 Hewlett Packard Enterprise Development LP # Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- - name: Test that we have a good cloud logfile: tempest.log prefix: tempest exec: - validate-swift.bash tempest: - "+tempest.api.object_storage" - name: Test reconfigure logfile: testsuite-reconfigure.log prefix: reconfigure playbooks: - swift-reconfigure.yml exec: - validate-swift.bash - name: Test stop/start of swift logfile: stop-start.log prefix: stop-start playbooks: - swift-stop.yml - swift-start.yml exec: - validate-swift.bash - name: Test swift compare model rings logfile: swift-compare-model-rings.log prefix: swift-compare-model-rings playbooks: - swift-compare-model-rings.yml exec: - validate-swift.bash - name: Test swift dispersion populate logfile: swift-dispersion-populate.log prefix: swift-dispersion-populate playbooks: - swift-dispersion-populate.yml exec: - validate-swift.bash - name: Test swift dispersion report logfile: swift-dispersion-report.log prefix: swift-dispersion-report playbooks: - swift-dispersion-report.yml exec: - validate-swift.bash - name: Test swift reconfigure credentials change logfile: swift-reconfigure-credentials-change.log prefix: swift-reconfigure-credentials-change playbooks: - swift-reconfigure-credentials-change.yml exec: - validate-swift.bash - name: Test swift upgrade logfile: swift-upgrade.log prefix: swift-upgrade playbooks: - swift-upgrade.yml exec: - validate-swift.bash - name: Test clusters minus one proxy server # can't bring down object node as erasure coding requires at least 11 disks logfile: shutdown-nodes.log prefix: shutdown-nodes vms: - shutdown: - server4 tempest: - "+tempest.api.object_storage" - name: Bring up downed server and start services logfile: restart-server.log prefix: restart-server vms: - start: - server4 playbooks: - swift-start.yml exec: - validate-swift.bash 07070100000013000081ED000003E800000064000000015BE06E03000002BB000000000000000000000000000000000000004E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tests/validate-swift.bash#!/bin/bash # # Copyright 2016 Hewlett Packard Enterprise Development LP # Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # ansible-playbook -i hosts/verb_hosts swift-status.yml 07070100000014000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000003900000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny07070100000015000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000004500000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model07070100000016000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000004A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model/data07070100000017000081A4000003E800000064000000015BE06E03000005FA000000000000000000000000000000000000005C00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model/data/control_plane.yml# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 control-planes: - name: ccp control-plane-prefix: ccp region-name: region1 failure-zones: - AZ1 - AZ2 - AZ3 common-service-components: - lifecycle-manager-target clusters: - name: cluster0 cluster-prefix: c0 server-role: - SERVER1-ROLE member-count: 1 allocation-policy: strict service-components: - lifecycle-manager - lifecycle-manager-target - openstack-client - mysql - ip-cluster - keystone-api - keystone-client - rabbitmq - swift-ring-builder - swift-proxy - swift-account - swift-container - swift-object - swift-client - memcached 07070100000018000081A4000003E800000064000000015BE06E0300000765000000000000000000000000000000000000005400000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model/data/disks.yml# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 disk-models: - name: DISKS volume-groups: # The policy is not to consume 100% of the space of each volume group. # 5% should be left free for snapshots and to allow for some flexibility. # sda_root is a templated value to align with whatever partition is really used # This value is checked in os config and replaced by the partition actually used # on sda e.g. sda1 or sda5 - name: ardana-vg physical-volumes: - /dev/sda_root logical-volumes: - name: root size: 75% fstype: ext4 mount: / - name: log size: 15% mount: /var/log fstype: ext4 mkfs-opts: -O large_file - name: crash size: 2% mount: /var/crash fstype: ext4 mkfs-opts: -O large_file device-groups: - name: swiftobj devices: - name: /dev/sdb - name: /dev/sdc - name: /dev/sdd - name: /dev/sde - name: /dev/sdf consumer: name: swift attrs: rings: - account - container - object-0 07070100000019000081A4000003E800000064000000015BE06E03000003F4000000000000000000000000000000000000005600000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model/data/servers.yml# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 baremetal: netmask: 255.255.255.0 subnet: 192.168.110.0 server-interface: eth2 servers: - id: server1 ip-addr: 192.168.110.3 role: SERVER1-ROLE server-group: AZ1 mac-addr: a4:93:0c:4f:7c:73 nic-mapping: VAGRANT ilo-ip: 192.168.109.3 ilo-password: password ilo-user: admin 0707010000001A000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000005000000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model/data/swift0707010000001B000081A4000003E800000064000000015BE06E0300000561000000000000000000000000000000000000005A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/ardana-ci/tiny/input-model/data/swift/rings.yml# # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # --- product: version: 2 ring-specifications: - region-name: region1 swift-zones: - id: 1 server-groups: - CLOUD rings: - name: account display-name: Account Ring min-part-hours: 16 partition-power: 12 replication-policy: replica-count: 3 - name: container display-name: Container Ring min-part-hours: 16 partition-power: 12 replication-policy: replica-count: 3 - name: object-0 display-name: General default: yes min-part-hours: 16 partition-power: 12 replication-policy: replica-count: 3 0707010000001C000041ED000003E800000064000000035BE06E0300000000000000000000000000000000000000000000002E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc0707010000001D000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003500000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source0707010000001E000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000003E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/Makefile0707010000001F000081A4000003E800000064000000015BE06E0300001697000000000000000000000000000000000000004C00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/access_log_metrics.rst .. code:: (c) Copyright 2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Access Log Metrics (swiftlm-access-log-tailer) ============================================== Introduction ------------ The access log (aka the swift proxy log) is scanned by the swiftlm-access-log-tailer program. This program identifies and counts operations made against accounts (aka projects). The purpose of the metrics is to calculate the rate at which operations are being performed and to calculate the number of bytes written and read. The data is reported every minute i.e., the accesses during the proceeding minutes are counted and reported on at the end of that minute. In general, the data is reported as: - Totals for all projects accessed during the period - Per-project accesses. These metrics include a project dimension The metrics are written as a json file. This file can then be consumed by the Monasca Swiftlm Plugin. Since the metrics are reported by each proxy server and since requests are randomly distributed among proxy servers, the values for any given host do not have much meaning. Instead, these metrics are mostly designed to be consumed my the Monasa Transform process that will aggregate values across all proxy servers. .. _swiftlm-access-log-tailer-metrics: Metrics Produced by swiftlm-access-log-tailer-metrics ----------------------------------------------------- The Monasca Agent will add additional dimensions such as: * hostname * cluster * control_plane These are not listed in the dimensions below. * swiftlm.access.host.operation.ops - The number of API requests made during the last minute to this host - Dimensions: * service: object-storage - Value Class: Value - Value Meta: None - Description This metric is a count of the all the API requests made to Swift that were processed by this host during the last minute. Requests to /healthcheck and /info are not counted. The count includes invalid requests, so the value may be larger than an aggregation of swiftlm.access.host.operation.project.ops because that only counts operations to an identified project. * swiftlm.access.host.operation.project.ops - The number of API requests made during the last minute to this host for a specific project - Dimensions: * tenant_id: the project id being accessed * service: object-storage - Value Class: Value - Value Meta: None - Description This metric is a count of the all the API requests made to Swift that were processed by this host during the last minute to a given project id. All requests, whether successful or not, are counted. The project id is identified by its position in the request path, so the project id might not be a valid project id (i.e., it might not exist in Keystone). * swiftlm.access.host.operation.get.bytes - The number of object bytes read by clients through this host - Dimensions: * service: object-storage - Value Class: Value - Value Meta: None - Description This metric is the number of bytes read from objects in GET requests processed by this host during the last minute. Only successful GET requests to objects are counted. GET requests to the account or container is not included. * swiftlm.access.host.operation.project.get.bytes - The number of object bytes read by clients through this host for a specific project - Dimensions: * tenant_id: project id * service: object-storage - Value Class: Value - Value Meta: None - Description This metric is the number of bytes read from objects in GET requests processed by this host for a given project during the last minute. Only successful GET requests to objects are counted. GET requests to the account or container is not included. * swiftlm.access.host.operation.put.bytes - The number of object bytes writen by clients through this host - Dimensions: * service: object-storage - Value Class: Value - Value Meta: None - Description This metric is the number of bytes written to objects in PUT or POST requests processed by this host during the last minute. Only successful requests to objects are counted. Requests to the account or container is not included. * swiftlm.access.host.operation.project.put.bytes - The number of object bytes written by clients through this host for a specific project - Dimensions: * tenant_id: project id * service: object-storage - Value Class: Value - Value Meta: None - Description This metric is the number of bytes written to objects in PUT or POST requests processed by this host for a given project during the last minute. Only successful requests to objects are counted. Requests to the account or container is not included. * swiftlm.access.host.operation.status - The status of the swiftlm-access-log-tailer program - Dimensions: * service: object-storage - Value Class: Status - Value Meta: A message indicating the problem - Description This metric reports whether the swiftlm-access-log-tailer program is running normally.07070100000020000081A4000003E800000064000000015BE06E0300002E8B000000000000000000000000000000000000004900000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/aggregate_scout.rst .. code:: (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Scout and Aggregation Utility (swiftlm-aggregate) ================================================= Introduction ------------ The swiftlm-aggregate program is run as a regular cron job. It gathers data using the Swift recon mechanism from all nodes. It then aggregates the data and generates appropriate metrics for the aggregated data. The metrics are written as a json file. This file can then be consumed by the Monasca Swiftlm Plugin. .. _swiftlm-aggregate-metrics: Metrics Produced by swiftlm-aggregate ------------------------------------- The Monasca Agent will add additional dimensions such as: * cluster * control_plane These are not listed in the dimensions below. * swiftlm.async_pending.cp.total.queue_length - Reports the total length of the async pending queue of all object server nodes - Option: --async - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description This metric reports the total length of all async pending queues in the system. When a container update fails, the update is placed on the async pending queue. An update may fail becuase the container server is too busy or because the server is down or failed. Later the system will "replay" updates from the queue -- so eventually, the container listings will show all objects known to the system. If you know that container servers are down, it is normal to see the value of async pending increase. Once the server is restored, the value should return to zero. A non-zero value may also indicate that containers are too large. Look for "lock timeout" messages in /var/log/swift/swift.log. If you find such messages consider reducing the container size or enable rate limiting. * swiftlm.diskusage.cp.total.size - Is the total raw size of all drives in the system. - Option: --diskusage - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description Is the size in bytes of raw size of all drives in the system. * swiftlm.diskusage.cp.total.avail - Is the total available size of all drives in the system. - Option: --diskusage - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description Is the size in bytes of available (unused) space of all drives in the system. Only drives used by Swift are included. * swiftlm.diskusage.cp.total.used - Is the total used size of all drives in the system. - Option: --diskusage - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description Is the size in bytes of used space of all drives in the system. Only drives used by Swift are included. * swiftlm.diskusage.cp.avg.usage - Is the average utilization of all drives in the system - Option: --diskusage - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description Is the average utilization of all drives in the system. The value is a percentage (example: 30.0 means 30% of the total space is used). * swiftlm.diskusage.cp.min.usage - Is the lowest utilization of all drives in the system - Option: --diskusage - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description Is the lowest utilization of all drives in the system. The value is a percentage (example: 10.0 means at least one drive is 10% utilized) * swiftlm.diskusage.cp.max.usage - Is the highest utilization of all drives in the system - Option: --diskusage - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description Is the highest utilization of all drives in the system. The value is a percentage (example: 80.0 means at least one drive is 80% utilized). The value is just as important as swiftlm.diskusage.usage.avg. For example, if swiftlm.diskusage.usage.avg is 70% you might think that there is plenty of space available. However, if swiftlm.diskusage.usage.max is 100%, this means that some objects cannot be stored on that drive. Swift will store replicas on other drives. However, this will create extra overhead. * swiftlm.md5sum.cp.check.ring_checksums - Checks if rings are consistent - Option: --md5sum - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Status (value=0 is OK; 2 is Failed) - Value Meta: msg * Rings are consistent on all hosts Ok * Checksum or number of rings not the same on all hosts The same set of rings is not present on all hosts. - Description If you are in the middle of deploying new rings, it is normal for this to be in the failed state. However, if you are not in the middle of a deployment, you need to investigate the cause. Use "swift-recon --md5 -v" to identify the problem hosts. * swiftlm.replication.cp.avg.account_duration - Is the average time for the account replicator to complete a scan - Option: --replication - Dimensions: * observer_host: name of host doing the aggregation * service: object-storage * component: account-replicator - Value Class: Value - Value Meta: None - Description This is the average across all servers for the account replicator to complete a cycle. As the system becomes busy, the time to complete a cycle increases. The value is in seconds. * swiftlm.replication.cp.avg.container_duration - Is the average time for the container replicator to complete a scan - Option: --replication - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage * component: container-replicator - Value Class: Value - Value Meta: None - Description This is the average across all servers for the container replicator to complete a cycle. As the system becomes busy, the time to complete a cycle increases. The value is in seconds. * swiftlm.replication.cp.avg.object_duration - Is the average time for the object replicator to complete a scan - Option: --replication - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage * component: object-replicator - Value Class: Value - Value Meta: None - Description This is the average across all servers for the object replicator to complete a cycle. As the system becomes busy, the time to complete a cycle increases. The value is in seconds. * swiftlm.replication.cp.max.account_last - Is the age of the oldest account replicator that completed a scan - Option: --replication - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage * component: account-replicator - Value Class: Value - Value Meta: None - Description This is the number of seconds since the account replicator last completed a scan on the host that has the oldest completion time. Normally the replicators runs periodically and hence this value will decrease whenever a replicator completes. However, if a replicator is not completing a cycle, this value increases (by one second for each second that the replicator is not completing). If the value remains high and increasing for a long period of time, it indicates that one of the hosts is not completing the replication cycle. * swiftlm.replication.cp.container_last - Is the age of the oldest container replicator that completed a scan - Option: --replication - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage * component: container-replicator - Value Class: Value - Value Meta: None - Description This is the number of seconds since the container replicator last completed a scan on the host that has the oldest completion time. Normally the replicators runs periodically and hence this value will decrease whenever a replicator completes. However, if a replicator is not completing a cycle, this value increases (by one second for each second that the replicator is not completing). If the value remains high and increasing for a long period of time, it indicates that one of the hosts is not completing the replication cycle. * swiftlm.replication.cp.object_last - Is the age of the oldest object replicator that completed a scan - Option: --replication - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description This is the number of seconds since the object replicator last completed a scan on the host that has the oldest completion time. Normally the replicators runs periodically and hence this value will decrease whenever a replicator completes. However, if a replicator is not completing a cycle, this value increases (by one second for each second that the replicator is not completing). If the value remains high and increasing for a long period of time, it indicates that one of the hosts is not completing the replication cycle. * swiftlm.load.cp.avg.five - Is the average five minute load average of all hosts in the system - Option: --load - Dimensions: * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description This is the averaged value of the five minutes system load average of all nodes in the Swift system. * swiftlm.load.cp.max.five - Is the maximum five minute load average of all hosts in the system - Option: --load - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description This is the five minute load average of the busiest host in the Swift system. * swiftlm.load.cp.min.five - Is the minimum five minute load average of all hosts in the system - Option: --load - Dimensions: * hostname: set to "_" * observer_host: name of host doing the aggregation * service: object-storage - Value Class: Value - Value Meta: None - Description This is the five minute load average of the least loaded host in the Swift system.07070100000021000081A4000003E800000064000000015BE06E030000032A000000000000000000000000000000000000003D00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/conf.py# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8' # The master toctree document. master_doc = 'index'07070100000022000081A4000003E800000064000000015BE06E0300000EFE000000000000000000000000000000000000003F00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/index.rst .. code:: (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Swiftlm Documentation ===================== Introduction ------------ The swiftlm project contains a number of features designed to help manage and monitor Swift. The main components in swiftlm as as follows: * swiftlm-scan The swiftlm-scan utility comprises a number of checks and metric-measurement functions. When run, it scans the system and generates a list of metrics. The metrics are encoded in JSON. The format and layout of the metrics is designed to be compatible with Monasca. Howerver, since the data is encoded as JSON, it is possible to integrate the results with other monitoring systems. * swiftlm-aggregate This uses the recon mechanism to gather system-wide data and create metrics for consumption by the Monasca-agent plugin. * swiftlm-uptime-monitor The swiftlm-uptime-monitor program monitors the VIP of a Swift system and determines if the system is responding. It measures the uptime and latency of the Swift service. * swiftlm-access-log-tailer The swiftlm-access-log-tailer scans the swift.log looking for records from the swift-proxy-server that relate to API requests made by clients. It counts these requests to work out the operation rate and number of object bytes read and written. It reports these as totals and by project id. Since this data relates to a given host, the metrics only make sense when aggregated across all Swift project server hosts. * Monasca Agent for swiftlm This is a Monasca-Agent plug in. It's purpose is to report metrics generated by swiftlm-scan and swiftlm_uptime_mon to Monasca. * swiftlm-ring-supervisor This utility is used to build and manage rings. The swiftlm-ring-supervisor is tightly integrated with the Ardana OpenStack Lifecycle Manager (Ardana) data model. The concept is that you provide a declarative description of your cloud (the input model) and the swiftlm-ring-supervisor will figure out the appropriate ring changes so that the Swift system uses the cloud resources as specified in the input model. Metric Information ------------------ The metrics produced by swiftlm are described in: * :ref:`swiftlm-scan-metrics` * :ref:`swiftlm-uptime-mon-metrics` * :ref:`swiftlm-aggregate-metrics` * :ref:`swiftlm-access-log-tailer-metrics` Developer Information --------------------- * `Standalone Script Setup `_ * `Monasca Plugin `_ * `Developing swiftlm-scan Checks `_ Building these documents ------------------------ You can build HTML versions of the docs with:: tox -e docs Point your browser at this URL, where ```` is where the swiftlm repository is checked out:: file:///home//swiftlm/doc/build/html/index.html To test correctness of an rst file, run rst2html.py as follows:: rst2html < blah.rst > ../build/blah.rst The rst2html.py utility is in docutils. Install as follows:: pip install docutils Table of Contents ------------------ .. toctree:: :maxdepth: 2 monasca_plugin swiftlm_scan_metrics swiftlm_uptime_mon_metrics aggregate_scout test_runner standalone_scripts 07070100000023000081A4000003E800000064000000015BE06E0300002A2D000000000000000000000000000000000000004800000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/monasca_plugin.rst .. code:: (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Swiftlm custom plugins for monasca agent ======================================== Introduction ------------ There are two steps to installing a custom plugin: install a detect plugin and install a check plugin. The detect plugin ----------------- A custom detect plugin is installed in `/usr/lib/monasca/agent/custom_detect.d`. This detect class gets called when the `monasca-setup` command is run specifying the class name with its `-d` option. This is done by the ansible task `monasca_agent_plugin `_ The detect plugin can be passed args from the `monasca-setup` cli or from ansible task. GOTCHA: as far as I can tell the args values cannot contain whitespace. The detect plugin implements a method that should generate config for the actual check plugin. This config is then used to create a yaml file in `/etc/monasca/agent/conf.d`. The name of this yaml file must match the name of the check plugin module described next. The detect plugin generates config for each instance of a check - each instance may have different config. For the `swiftlm`_check we generate a single instance of the check that runs all the `swiftlm` tasks, where a task is for example `connectivity` or `drive-audit`. An alternative design would generate multiple instances each configured to run one `swiftlm` task. Note: it is possible to create the yaml file directly but the recommended way is to use a detect class to avoid learning the yaml schema. The check plugin ---------------- A custom check plugin is installed in `/usr/lib/monasca/agent/custom_checks.d`. The class in this module extends a monasca agent class, and implements a check method that is called each time the agent daemon wakes up. The check method gathers metrics from all the configured swiftlm tasks: - entry point (or 'plugin') tasks: entry points are loaded and called directly. These tasks are disabled by default because the entry points are not installed in the monasca-agent venv. - command line tasks: the `swiftlm-scan --format json` command is called for each command line task. These tasks are currently a hard-coded list but could be configured via the ansible task that deploys the detect plugin. - load file tasks: metrics are read from files(s) configured via the ansible task that deploys the detect plugin. The file should contain a json encoded list of metric dicts. Each task is assumed to return a single dict or a list of dicts. Each dict should have keys that match the keyword args expected by the monasca agent super-class gauge method(), which the plugin calls for each metric. Metrics that are reporting a severity level of OK (i.e. 0) may be suppressed i.e. not reported to the monasca-agent. By default none are suppressed - the `suppress_ok` arg for the detect plugin may be used to configure suppression for specific command tasks. TBD: the OK suppression needs to distinguish between metrics whose value represents a severity level and metrics whose value represents a raw value. The plugin itself may generate metrics if a task fails e.g. if a file cannot be found or a command line task times out. (An alternative design would be to have individual check classes for each type of task (command line, load file etc.). However, it is not clear that the monasca agent supports multiple check classes in the same module since the name of the check config file in `/etc/monasca/agent/conf.d` must match the name of the *module* in `/usr/lib/monasca/agent/custom_checks.d`. Consequently, each check class would need to be in a separate module, those modules would be unable to import from a common module or each other, and there would be no way to re-use common code between the check classes.) Metric uploading to monasca --------------------------- The monasca-agent collector daemon calls the plugin check method periodically (every 60 secs by default). It aggregates metrics and de-duplicates them. The monasca-agent forwarder sends metrics to a monasca service using monascaclient. Seeing it work -------------- Follow process for ardana-dev-tools install. Before running `ardana-deploy.yml` on the deployer vm: - comment out all services after swift, to save time. - Follow instructions here https://jira.hpcloud.net/browse/JAH-1871 to use vertica db with monasca. That workaround is currently required to have `monasca metric-list` return any results (see below). Then, on the deployer VM:: $ ansible-playbook -i hosts/verb_hosts monasca-cli-dev.yml \ --limit STANDARDBASE-CCP-T1-M1-NETCLM $ ssh STANDARDBASE-CCP-T1-M1-NETCLM $ source ~/monasca-env.sh Check swiftlm plugins source code modules are installed:: $ sudo ls -R /usr/lib/monasca/agent/custom_* /usr/lib/monasca/agent/custom_checks.d: swiftlm_check.py /usr/lib/monasca/agent/custom_detect.d: swiftlm_detect.py swiftlm_detect.pyc Check swiftlm_check is installed in agent conf.d:: $ sudo less /etc/monasca/agent/conf.d/swiftlm_check.yaml init_config: null instances: - dimensions: service: object-storage metrics_files: /var/cache/swift/swiftlm_uptime_monitor/uptime.stats name: swiftlm_check To inspect the monasca agent collector log:: $ sudo tail -f /var/log/monasca/agent/collector.log |grep swiftlm This relates to the daemon that periodically runs the checks. (NB 60 secs between updates) Default log level is WARN so you will see nothing if all is ok, but if one of the custom plugin tasks fails (e.g. uptime.stats file not found) then this will show up as a WARNING log message. To check the monasca agent forwarder log (this relates to calls from agent to monasca service):: $ sudo tail -f /var/log/monasca/agent/forwarder.log To set log level lower for collector.log:: $ sudo /opt/monasca/bin/monasca-setup -u monasca -p monasca \ --keystone_url=http://STANDARDBASE-CCP-T01-VIP-KEY-API-NETCLM:35357/v3 \ --project_name admin --log_level=DEBUG All reported swiftlm_check metrics are logged at DEBUG level. NB GOTCHA: running monasca-setup will cause the swiftlmdetect class to be reloaded, but without any args, so the uptime states file arg will be lost on any host where it had been configured by the ansible playbook. So to manually reload the swiftlm_detect class and pass its args:: $ sudo /opt/monasca/bin/monasca-setup -d swiftlmdetect \ -a "metrics_files=/var/cache/swift/swiftlm_uptime_monitor/uptime.stats" NB: the name of the monasca detect plugin passed with the -d arg is the name of the python class, not the name of the python module file. The uptime stats file should be here:: $ sudo ls -l /var/cache/swift/swiftlm_uptime_monitor/uptime.stats Querying monasca using monasca cli ---------------------------------- Setup ``OS`` environment variables: . /home/stack/service.osrc To list metrics (NB: metric listing was broken at time of writing without the workaround to install vertica described here: https://jira.hpcloud.net/browse/JAH-1871):: $ monasca metric-list To list the monasca alarm definitions to check swiftlm alarms are installed:: $ monasca alarm-definition-list $ monasca alarm-definition-list |grep swiftlm To see detail of the swiftlm alarm definition(s), use alarm definition ID from above, e.g:: $ monasca alarm-definition-show a3089c39-ce65-402e-a1bf-61775b572c7f To see alarms:: $ monasca alarm-list |grep swiftlm To see measurements, in this case filtered for the swiftlm_check.task metrics that result from plugin task failures:: $ monasca measurement-list swiftlm_check.task 2015-07-08T12:58:48.000Z \ --merge_metrics --dimensions hostname=STANDARDBASE-CCP-T1-M1-NETCLM To see results of swift_services check for the last minute (-1 argument):: $ monasca measurement-list swiftlm.swift.swift_services -1 --merge_metrics \ --dimensions hostname=standard-ccp-c1-m1-mgmt The last minute might be too limiting, so also try with -2 (last two minutes). You can add dimensions and remove --merge_metrics as follows:: $ monasca measurement-list swiftlm.swift.swift_services -3 \ --dimensions hostname=standard-ccp-c1-m1-mgmt,component=account-server It is possible to manually create metrics using the `monasca metric-create` command. Querying monasca using swiftlm-monasca -------------------------------------- The main difference between the monasca CLI and swiftlm-monasca is that the measurement values and associated metrics are shown together. In addition, swiftlm-scan will automatically retry operations and handles the API's pagination scheme. Setup ``OS`` environment variables:: . /home/stack/service.osrc See metrics as they are posted (it allows 1-2 minutes to elapse before showing you the measurements):: swiftlm-monasca tail --dim=service:object-storage Or, to narrow:: swiftlm-monasca tail --metric_name=swiftlm.systems.check_mount \ --dim=hostname:standard-ccp-c1-m2-mgmt To find historical data (last 5 minutes until now):: swiftlm-monasca find --metric_name=swiftlm.systems.check_mount \ --dim=hostname:standard-ccp-c1-m2-mgmt \ --start_time=-5 \ --end_time=0 To get average of swiftlm-uptime-mon latency for given hour:: swiftlm-monasca aggregate --metric_name=swiftlm.avg_latency_sec \ --dim=component:rest-api \ --start_time=2015-11-24T16:00:00.000Z \ --end_time=2015-11-24T16:59:59.000Z \ Post a metric. If you post to a swiftlm metric, make sure you specify ALL dimensions -- otherwise you get into merge_metrics hell:: swiftlm-monasca aggregate --metric_name=swiftlm.test.test \ --dim=hostname:blah \ --dim=service:object-storage --value=10.0 \ --value_meta="msg:note the quotes here" 07070100000024000081A4000003E800000064000000015BE06E0300000698000000000000000000000000000000000000004C00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/standalone_scripts.rst .. code:: (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Standalone Script Setup ======================= Standalone scripts are placed in the `swiftlm/cli` directory. To install the script it must be added to the `console_scripts` section in `setup.py`:: console_scripts=[ 'swiftlm-new-script = swiftlm.cli.new_script:main' ] Scripts must have the prefix `swiftlm` to be installed correctly. They should also have no file extension and use hyphens to separate words. After the install the ansible playbooks will symlink anything in the created anything with the correct prefix to `/usr/local/bin`. This means that all the swift scripts will be in `$PATH` and can be quickly called from the command line. Scripts should be run as standalone programs:: $ swift-drive-provision # Or if /usr/local/bin is not in $PATH $ /usr/local/bin/swift-drive-provision Attempting to run the script with python will cause errors and should not be done:: $ python /usr/local/bin/swift-drive-provision # This would cause the script to fail. 07070100000025000081A4000003E800000064000000015BE06E030000633E000000000000000000000000000000000000004E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/swiftlm_scan_metrics.rst .. code:: (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017-2018 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. swiftlm-scan ============ Introduction ------------ The swiftlm-scan program is comprised a number of "checks". Each check produces one or more metrics. This document describes the metrics. The following information is organised as follows: * Metric name This is the name of the metric being described. This is the name that Monasca receives. * Check name This is the name of the check producing the metric. * Value Class This is one the following. This is not a Monasca concept. We use it so that the table is less verbose. - Measurement -- this is used when the value of the metric reports a value. For example, the voltage of a 12V battery might be 11.98. - Status -- this is used when the value represents a state or status something in the system. The following values are used: 0. Status is normal or ok 1. Status is in a warning state 2. Status is in a failed state 3. The state is unknown (cannot be determined or not applicable at the current time Generally, a metric of this value class will also have a value_meta * Dimensions. This is the dimensions as sent to Monasca Agent. Monasca agent adds other dimensions such as cloud and control plane. For most metrics, it also sets the hostname dimension (for some metrics, the checks set the hostname to '_'). * Value_meta. This is optional. When present, it contains additional data in addition to the value of the metric. * Description. This provides a longer description of the values and meaning of the metric. * Troubleshooting/Resolution. This provides some suggestions for using and interpreting the metric values to troubleshoot and resolve problems on your system. .. _swiftlm-scan-metrics: Metrics Produced By swiftlm-scan -------------------------------- * swiftlm.diskusage.host.val.usage - Is the used % of a mounted filesystem - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage * mount: the mountpoint of the filesystem - Value Class: Value - Value Meta: None - Description This metric reports the percent usage of a Swift filesystem. The value is a floating point number in range 0.0 to 100.0 * swiftlm.diskusage.host.max.usage - Is highest used % of mounted filesystems on a host - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Value Class: Value - Value Meta: None - Description This metric reports the percent usage of a Swift filesystem that is most used (full) on a host. The value is the max of the percentage used of all Swift filesystems. * swiftlm.diskusage.host.min.usage - Is lowest used % of mounted filesystems on a host - Check: --check-mounts - Dimensions: * hostname: name of host being reported * service: object-storage - Value Class: Value - Value Meta: None - Description This metric reports the percent usage of a Swift filesystem that is least used (has free space) on a host. The value is the min of the percentage used of all Swift filesystems. * swiftlm.diskusage.host.avg.usage - Is average used % of mounted filesystems on a host - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Value Class: Value - Value Meta: None - Description This metric reports the average percent usage of all Swift filesystems on a host. * swiftlm.diskusage.host.val.used - Is the number of bytes used in a mounted filesystem - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage * mount: the mountpoint of the filesystem - Value Class: Value - Value Meta: None - Description This metric reports the the number of used bytes in a Swift filesystem. The value is an integer (units: Bytes) * swiftlm.diskusage.host.val.size - Is the size in bytes of a mounted filesystem - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage * mount: the mountpoint of the filesystem - Value Class: Value - Value Meta: None - Description This metric reports the the size in bytes of a Swift filesystem. The value is an integer (units: Bytes) * swiftlm.diskusage.host.val.avail - Is the number of bytes available (free) in a mounted filesystem - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage * mount: the mountpoint of the filesystem - Value Class: Value - Value Meta: None - Description This metric reports the the number of bytes available (free) in a Swift filesystem. The value is an integer (units: Bytes) * swiftlm.systems.check_mounts - Reports the status of mounted Swift filesystems - Check: --check-mounts - Dimensions: * hostname: set by Monasca Agent * service: object-storage * mount: the mountpoint of the filesystem - Value Class: Status - Value Meta: * `{device} mounted at {mount} ok` Normal, ok, state, for example:: /dev/sdc1 mounted at /srv/node/disk0 ok * `{device} not mounted at {mount}` * `{device} mounted at {mount} has permissions {permissions} not 755` * `{device} mounted at {mount} is not owned by swift, has user: {user}, group: {group}` * `{device} mounted at {mount} has invalid label {label}` * `{device} mounted at {mount} is not XFS` * `{device} mounted at {mount} is corrupt` - Description This metric reports the mount state of each drive that should be mounted on this node. You can attempt to remount by logging into the node and running the following command:: sudo swiftlm-drive-provision --mount * swiftlm.systems.connectivity.memcache_check - Reports if a proxy server can connect to memcached on other servers - Check: --connectivity - Dimensions: * observer_host: the host reporting the metric. * url: The network-name/port of the remote memcached * service: object-storage * hostname: set to '_' - Value Class: Status - Value Meta: See swiftlm.systems.connectivity.connect_check - Description This metric reports if memcached on the host as specified by the url dimension is accepting connections from the host running the check (observer_host). The following value_meta.msg are used: * swiftlm.systems.connectivity.connect_check - Reports if a Swift server can connect to a VIP used by the Swift service - Check: --connectivity - Dimensions: * observer_host: the host that is able/unable to connect to the VIP * url: the URL of the endpoint being checked * service: object-storage * hostname: set to '_' - Value Class: Status - Value Meta: The following value_meta.msg are used: * `: ok` We successfully connected to on port * `: [Errno -2] Name or service not known` This should not normally happen since endpoints are usually resolved in /etc/hosts. * `: [Errno -2] timed out` Timed out connecting to the VIP. The service may be unresponsive or there may be a network connectivity problem. * `: [Errno 111] Connection refused` Usually indicates that the service (or haproxy or load balancer) is not running. * `: ` As per * `: check thread did not complete` This indicates that the thread connecting to the endpoint did not exit in time. There may by a problem with one of the other threads -- not necessarily a problem with this endpoint. - Description This metric reports if a server can connect to a VIPs. Currently the following VIPs are checked: * The Keystone VIP used to validate tokens (normally port 5000) The check simply connects to the :. It does not attempt to send data. - Troubleshooting/Resolution If the Keystone service stops working, all Swift proxy servers will report a connection failure. Restoring the Keystone service will resume normal operations. If a single Swift proxy server is reporting a problem you should investigate the connectivity of that server. Since this server can no longer validate tokens, your users will get (apparently random) 401 responses. Consider stopping swift-proxy-server on that node until you determine why it cannot connect to the Keystone service. * swiftlm.systems.connectivity.rsync_check - Reports if a proxy server can connect to rsyncd on other servers - Check: --connectivity - Dimensions: * observer_host: the host reporting the metric. * url: the network-name/port of the remote rsyncd * service: object-storage * hostname: set to '_' - Value Class: Status - Value Meta: See swiftlm.systems.connectivity.connect_check - Description This metric reports if rsyncd on the host as specified by the url dimension is accepting connections from the host running the check (observer_host). * swiftlm.systems.ntp NOT IMPLEMENTED - Reports if NTP is running on the server. - Check: --ntp - Dimensions: * hostname: set by Monasca Agent * service: object-storage * error: Text of any error messages that occur - Description This metrics reports if NTP is running on the host. The host uses `systemctl status` to determine this. - Value Class: Status - Value Meta: The following value_meta.msg are used: * `OK` NTP is running. * `ntpd not running: ` NTP was not running. Error is the text returned by systemctl which may help diagnose the issue. - Troubleshooting/Resolution When NTP is not running XXX * swiftlm.systems.ntp.stratum NOT IMPLEMENTED - Reports the statum level of NTP - Check: --ntp - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Description This metric's value will be the stratum level of the current server. This is determined using the output of `ntpq -pcrv`. - Troubleshooting/Resolution When the stratum level increases this indicates that time is being recieved from less accurate sources. Ensure that the configured master NTP servers are up and that no other servers have been added to the time reference list. All servers should be within +/-1 stratum level of each other at most. * swiftlm.systems.ntp.offset NOT IMPLEMENTED - Reports the offset from the system clock and the reported NTP time. - Check: --ntp - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Description This metric's value will be the offset of the system clock and the NTP time. This is determined using the output of `ntpq -pcrv`. - Troubleshooting/Resolution A high offset means that the server isnt adjusting its time correctly or that its hardware clock is malfunctioning. If the clock is battery backed it could be at a low power level. * swiftlm.swift.file_ownership.config - Reports if Swift configuration files have the appropriate owner - Check: --file-ownership - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Value Class: Status - Value Meta: If multiple errors are found, only the first error is shown in the value meta as follows: * `OK` - no errors * `Path: is not owned by swift` * `Path: {path} should not be empty` * `Path: {path} is missing` - Description This metric reports if a directory or file has the appropriate owner and other attributes. - Troubleshooting/Resolution Improper ownership of configuration files may be due to manual editing or copy of files. Returning the configuration process may resolve the problem. If not, check that the file is a configuration file that is actually used by Swift. If not, consider deleting or moving it. * swiftlm.swift.file_ownership.data - Reports if Swift mountpoint (/srv/node/disk) have the appropriate owner - Check: --file-ownership - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Value Class: Status - Value Meta: If multiple errors are found, only the first error is shown in the value meta as follows: * `OK` - no errors * `Path: is not owned by swift` * `Path: {path} is missing` - Description Improper ownership of top-level directories on mounted filesystems may be due to insertion of a disk drive that belongs to a different system. The Swift processes will be unable to write accounts, containers or objects to the filesystems. You should stop all Swift processes and perform a rename of all files on the filesystem to correct the problem. There is a special case: the directory /srv/node/disk is owned by the root user. This happens when a filesystem fails to mount -- and so we see the ownership of the mount point -- not the mounted filesystem root directory. * swiftlm.swift.replication.object.last_replication, swiftlm.swift.replication.container.last_replication, swiftlm.swift.replication.account.last_replication - Reports how long it has been since the replicator last finished a replication run. The replicator in question is indicated in the metric name. - Check: --replication - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: account-replicator or container-replicator or object-replicator - Value Class: Measurement - Value Meta: * None - Description This reports how long (in seconds) since the replicator process last finished a replication run. If the replicator is stuck, the time will keep increasing forever. The time a replicator normally takes depends on disk sizes and how much data needs to be replicated. However, a value over 24 hours is generally bad. - Troubleshooting/Resolution The replicator might be stuck (XFS filesystem hang or other issue). Restart the process in question. For example, to restart the object-replicator:: sudo systemctl restart swift-object-replicator * swiftlm.swift.drive_audit - Reports the status from the swift-drive-audit program - Check: --drive-audit - Dimensions: * hostname: set by Monasca Agent * service: object-storage * mount_point: the mountpoint of the filesystem - Value Class: Status - Value Meta: * `No errors found on device mounted at: /srv/node/disk0` No errors were found * `Errors found on device mounted at: /srv/node/disk0` Errors were found in the kernel log - Description If an unrecoverable read error (URE) occurs on a filesystem, the error is logged in the kernel log. The swift-drive-audit program scans the kernel log looking for patterns indicating possible UREs. To get more information, log onto the node in question and run:: sudo swift-drive-audit /etc/swift/drive-audit.conf UREs are common on large disk drives. They do not necessarily indicate that the drive is failed. You can use the xfs_repair command to attempt to repair the filesystem. Failing this, you may need to wipe the filesystem. If UREs occur very often on a specific drive, this may indicate that the drive is about to fail and should be replaced. * swiftlm.swift.swift_services - Reports if a Swift process (daemon/server) is running or not - Check: --swift-services - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: the process (daemon/server) being reported - Value Class: Status - Value Meta: * ` is running` The named process is running. * ` is not running` The named process has stopped. - Description This metric reports of the process as named in the component dimension and the msg value_meta is running or not. Use the swift-start.yml playbook to attempt to restart the stopped process (it will start any process that has stopped -- you don't need to specifically name the process). * swiftlm.swift.swift_services.check_ip_port - Reports if a service is listening to the correct ip and port - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: the process (daemon/server) being reported - Value Class: Status - Value Meta: * `ok` is listening to the correct ip and port * ` is not listening to the correct ip or port` is not listening to the correct ip or port - Description This metric reports whether or not rsync is listening to the correct ip or port * swiftlm.load.host.val.five - Is the 5 minute load average of a host - Check: --system - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Value Class: Value - Value Meta: None - Description This metric reports the 5 minute load average of a host. The value is derived from /proc/loadavg. * swiftlm.hp_hardware.ssacli.smart_array.firmware - Is the firmware version of a component of a Smart Array controller - Check: --ssacli - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: Type of component this metric applies to. One of * controller: firmware version reported relates to the controller * model: The component model. Example: "Smart Array P410" * controller_slot: Slot number of controller - Value Class: Value - Value Meta: None - Description This metric reports the firmware version of a component of a Smart Array controller. * swiftlm.hp_hardware.ssacli.smart_array - Reports the status of a Smart Array component - Check: --ssacli - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: Type of component this metric applies to. One of: * controller: The sub-component is a component of the controller * sub_component: Sub component this metric applies to. One of the following, where component is "controller": * controller_status: the status of controller is being reported * battery_capacitor_presence: the presence/absence of battery/capacitor is being reported * battery_capacitor_status: the status of battery/capacitor is being reported * cache_status: the status of the cache is being reported * controller_not_hba_mode: whether the controller is in HBA mode (hopefully, not) * model: The component model. Example: "Smart Array P410" * controller_slot: Slot number of controller - Value Class: Status - Value Meta: * `OK` No errors were found * `controller_status': status is ` The (controller, cache, etc) is in a failed/error status (as indicated by the value) * Controller is in HBA mode; performance will be poor The controller is in HBA mode. This means that the cache is not used and hence performance of disk drives will be poor. - Description This reports the status of various sub-components of a Smart Array Controller. A failure is considered to have occured if: * Controller is failed * Cache is not enabled or has failed * Battery or capacitor is not installed * Battery or capacitor has failed * swiftlm.hp_hardware.ssacli.physical_drive - Reports the status of a Smart Array disk drive - Check: --ssacli - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: Is "physical_drive" * controller_slot: Slot number of controller * box: Box number of the drive * bay: Bay number of the drive - Value Class: Status - Value Meta: * `OK` No errors were found * `Drive : : has status: ` The disk drive identified by serial number, box and bay number has failed with a status as indicated by the value. On some Smart Arrays, the box/bay numbers are not deterministic so the serial number should be used to accurately determine the identity of a failed drive. - Description This reports the status of a disk drive attached to a Smart Array controller. * swiftlm.hp_hardware.ssacli.logical_drive - Reports the status of a Smart Array LUN - Check: --ssacli - Dimensions: * hostname: set by Monasca Agent * service: object-storage * component: Is "logical_drive" * controller_slot: Slot number of controller * array: The name of the array of which this logical drive is a member * logical_drive: The identity of the logical drive * sub_component: One of: * lun_status: the metric reports the LUN status * cache_status: the metric reports the cache status of the LUN - Value Class: Status - Value Meta: * `OK` No errors were found * `Logical Drive has status: ` The logical drive has failed with a status as indicated by the value. * `Logical Drive has cache status: ` The logical drive cache status is not enabled and working. Instead it has a status as indicated by the value. - Description This reports the status of a LUN presented by a Smart Array controller. A LUN is considered failed if the LUN has failed or if the LUN cache is not enabled and working. * swiftlm.swiftlm_check - Reports status of the Swiftlm Monasca-Agent Plug-in - Check: Generated by plug-in code - Dimensions: * hostname: set by Monasca Agent * service: object-storage - Value Class: Status - Value Meta: * `OK` The plug-in is working normally. * `file stale metrics` The file contains old metrics i.e., the file does not appear to be updating. This can indicate that the program that generates the metrics has stopped running. The Swift Uptime Monitor is an example of such a program. * Other error messages The error message indicates the nature of the problem. - Description This indicates of the Swiftlm Monasca Agent Plug-in is running normally. If the status is failed, it probable that some or all metrics are no longer being reported. * swiftlm.check.failure - Reports status of a swiftlm-scan check if an exception is raised - Check: Generated by plug-in code - Dimensions: * check: Name of the swiftlm-scan plugin * error: The error output from the plugin * component: swiftlm-scan * service: object-storage - Value Class: Status - Value Meta: * `OK` The plug-in is working normally. * ` failed with ` The check is the name of the swiftlm-scan plugin which was executing and raised an exception. The error is the text of the exception. Examples of the ssacli plugin expection message are: * `swiftlm.hp_hardware.ssacli failed with: Traceback (most recent call last):...ssacli ctrl all show detail failed with exit code: 123` The controller returned a non-zero exit code with error string of and an exit code of * `swiftlm.hp_hardware.ssacli failed with: Traceback (most recent call last):...ssacli ctrl slot= pd all show detail failed with exit code: 123` The controller returned a non-zero exit code with error string of and an exit code of * `swiftlm.hp_hardware.ssacli failed with: Traceback (most recent call last):...ssacli ctrl slot= ld all show detail failed with exit code: 123` The controller returned a non-zero exit code with error string of and an exit code of * Other error messages The error output and exit code indicate the nature of the problem. - Description The total exception string is truncated if longer than 1919 characters and an ellipsis is prepended in the first three characters of the message. If there is more than one error reported, the list of errors is paired to the last reported error and the operator is expected to resolve failures until no more are reported. Where there are no further reported errors, the Value Class is emitted as 'Ok'. 07070100000026000081A4000003E800000064000000015BE06E0300002D58000000000000000000000000000000000000005400000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/swiftlm_uptime_mon_metrics.rst .. code:: (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. swiftlm-uptime-monitor ====================== The swiftlm-uptime-monitor program monitors the VIP of a Swift system and determines if the system is responding. It measures the latency of the Keystone authentication process and the Swift service in two ways. The first check is the latency of a single successful keystone authentication round trip followed by N-iterations of the latency of an object store put-get-delete processes, then N-iterations of queries to the object store healthcheck REST API function. Introduction ------------ The swiftlm-uptime-monitor program emits a number of metrics corresponding to the data it monitors. The metric information is organised as follows: * Metric name This is the name of the metric being described. This is the name that Monasca receives. * Value Class This is one the following. This is not a Monasca concept. We use it so that the table is less verbose. - Measurement -- this is used when the value of the metric reports a value. For example, the voltage of a 12V battery might be 11.98. - Status -- this is used when the value represents a state or status something in the system. The following values are used: :: 0. Status is normal or ok 1. Status is in a warning state 2. Status is in a failed state 3. The state is unknown cannot be determined or not applicable at the current time Generally, a metric of this value class will also have a value_meta * Dimensions. This is the dimensions as sent to Monasca * Value_meta. This is optional. When present, it contains additional data in addition to the value of the metric. * Description. This provides a longer description of the values and meaning of the metric. * Troubleshooting/Resolution. This provides some suggestions for using and interpreting the metric values to troubleshoot and resolve problems on your system. .. _swiftlm-uptime-mon-metrics: Metrics Produced by swiftlm-uptime-monitor ------------------------------------------ I am assuming that, as mentioned in the aggregate_scout.rst doc, the Monasca Agent will add the "control_plane" dimension so it is not listed in the dimensions below. * swiftlm.umon.target.min.latency_sec Reports the minimum latency recorded in a sequence of response time checks of a component. - Dimensions: * component: the API being checked * keystone: The Keystone API (used to get an auth token) * rest-api: The Swift object-storage API (object put, get, and delete) * healthcheck: The /healthcheck REST call of the Swift API * hostname: Set to '_' * observer_host: The host reporting the metric. * service: object-storage * url: the base URL of the REST service called - Value Class: The minimum value of response latency in seconds - Description This metric reports the minimum response time in seconds of a REST call from the observer to the component REST API listening on the reported host - Troubleshooting/Resolution. The troubleshooting and resolution steps heavily depend on the nature of the failure. The keystone service could be stopped, in which case none of these checks will pass. Look through the $HOME/keystone.osrc file for the configured OS_AUTH_URL and verify that the keystone port is listening. For Swift Object Store and Swift Object Store Healthcheck, make sure that Swift services are active. * swiftlm.umon.target.max.latency_sec Reports the maximum latency recorded in a sequence of response time checks of a component. - Dimensions: * component: the API being checked * keystone: The Keystone API (used to get an auth token) * rest-api: The Swift object-storage API (object put, get, and delete) * healthcheck: The /healthcheck REST call of the Swift API * hostname: Set to '_' * observer_host: The host reporting the metric. * service: object-storage * url: the base URL of the REST service called - Value Class: The maximum value of response latency in seconds - Description This metric reports the maximum response time in seconds of a REST call from the observer to the component REST API listening on the reported host - Troubleshooting/Resolution: The troubleshooting and resolution steps heavily depend on the nature of the failure. The keystone service could be stopped, in which case none of these checks will pass. Look through the $HOME/keystone.osrc file for the configured OS_AUTH_URL and verify that the keystone port is listening. For Swift Object Store and Swift Object Store Healthcheck, make sure that Swift services are active. * swiftlm.umon.target.avg.latency_sec This metric reports the average value of the last N-iterations of latency measurements which have been recorded for a component. - Dimensions: * component: the API being checked * keystone: The Keystone API (used to get an auth token) * rest-api: The Swift object-storage API (object put, get, and delete) * healthcheck: The /healthcheck REST call of the Swift API * hostname: Set to '_' * observer_host: The host reporting the metric. * service: object-storage * url: the base URL of the REST service called - Value Class: The average value in seconds for N-iterations of response latency measures - Description Reports the average value of N-iterations of the latency values recorded for a component. - Troubleshooting/Resolution: The troubleshooting and resolution steps heavily depend on the nature of the failure. The keystone service could be stopped, in which case none of these checks will pass. Look through the $HOME/keystone.osrc file for the configured OS_AUTH_URL and verify that the keystone port is listening. For Swift Object Store and Swift Object Store Healthcheck, make sure that Swift services are active. * swiftlm.umon.target.check.state This metric reports the state of the last completed check of the component. - Dimensions: * component: the API being checked * keystone: The Keystone API (used to get an auth token) * rest-api: The Swift object-storage API (object put, get, and delete) * healthcheck: The /healthcheck REST call of the Swift API * hostname: Set to '_' * observer_host: The host reporting the metric. * service: object-storage * url: the base URL of the REST service called - Value Class: This is the staus of the metric, reported as one of three states: :: 0. Status is normal or ok 1. Status is in a warning state 2. Status is in a failed state 3. The state is unknown cannot be determined or not applicable at the current time This metric does not report a value_meta on an 'ok' state. The failed or unknown state reports the http return error. - Description This metric reports the state of each component after N-iterations of checks. If the initial check succeeds, the checks move onto the next component until all components are queried, then the checks sleep for 'main_loop_interval' seconds. If a check fails, it is retried every second for 'retries' number of times per component. If the check fails 'retries' times, it is reported as a fail instance. - Troubleshooting/Resolution: The troubleshooting and resolution steps heavily depend on the nature of the failure. The failing component should immediately report a failed state with the associated value_meta to give a hint to the root cause. Therefore the resolution efforts for this metric will be focused upon the failing latency metric as described above. * swiftlm.umon.target.val.avail_minute Reports whether the Object Store rest-api is responding. - Dimensions: * component: the API being checked * rest-api: The Swift object-storage API (object put, get, and delete) * hostname: Set to '_' * observer_host: The host reporting the metric. * service: object-storage * url: the base URL of the REST service called - Value Class: Either 100 for success, or 0 for fail. - Description A value of 100 indicates that swift-uptime-monitor was able to get a token from Keystone and was able to perform operations against the Swift API during the reported minute. A value of zero indicates that either Keystone or Swift failed to respond successfully. A metric is produced every minute that swift-uptime-monitor is running. - Troubleshooting/Resolution: This metric is reporting a summarized state of the uptime monitor metrics and therefore the troubleshooting and resolution of this metric is a by-product of the resolution of the latency and state troubleshooting and resolution root cause. * swiftlm.umon.target.val.avail_day Reports the average of the last 24 hours of per-minute availability of Object Store rest-api. - Dimensions: * component: the API being checked * rest-api: The Swift object-storage API (object put, get, and delete) * hostname: Set to '_' * observer_host: The host reporting the metric. * service: object-storage * url: the base URL of the REST service called - Value Class: The average availability as reported by the per-minute metric throughout the last 24 hours of minutes. - Description This metric reports the average of all the collected records in the swiftlm.umon.target.val.avail_minute metric data. This is a walking average data set of these approximately per-minute states of the Swift Object Store. The most basic case is a whole day of successful per-minute records, which will average to 100% availability. If there is any downtime throughout the day resulting in gaps of data which are two minutes or longer, the per-minute availability data will be "back filled" with an assumption of a down state for all the per-minute records which did not exist during the non-reported time. Because this is a walking average of approximately 24 hours worth of data, any outtage will take 24 hours to be purged from the dataset. - Troubleshooting/Resolution: This metric is reporting a summarized state of the uptime monitor metrics and therefore the troubleshooting and resolution of this metric is a by-product of the resolution of the latency and state troubleshooting and resolution root cause. 07070100000027000081A4000003E800000064000000015BE06E0300000B69000000000000000000000000000000000000004500000000python-swiftlm-8.0+git.1541434883.e0ebe69/doc/source/test_runner.rst .. code:: (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP (c) Copyright 2017 SUSE LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Developing swiftlm-scan Checks ============================== Adding a Check -------------- If it fits into theme of an existing folder place it there and register it (see below). If the check does not fit into the current folder structure, create a new folder and inside that folder place a blank `__init__.py` file and add your check script. Registering a Check ------------------- Checks must have a function named `main` that is the entry point to the check. The docstring for the main function will become the console help text for the check and so should be short but describe what the check accomplishes. Registering a check requires you to add it to the `setup.py` file. See Below for an example:: # With the directory structure below. $ touch swiftlm/swift/new_check.py $ cat< swiftlm/swift/new_check.py def main(): """ Command line help text relevant to new check. """ print('new check ok') END This sets up an example check file. Real checks should return a list of or an individual `MetricData` instance which is converted to monasca output (See other checks for an example of how to do this). You then need to edit the `setup.py` file and enter the check in the `entry_points` section. Checks are entered in the `swiftlm.plugins` list. The example below show how the entry will look. Other sections may be present but are not relevant to adding a new check:: entry_points={ 'swiftlm.plugins': [ 'new-check = swiftlm.swift.new_check:main', ], } `new-check` is the name of the check at the command line. Names are not required to name match to their file names but its a good convention. Running the new check is then just a matter of updating the swiftlm installation and running it:: $ swift-scan --new-check Will run the check by itself:: $ swift-scan If no flags are passed to `swift-scan` all checks including `new-check` are run. This does not need to be configured beyond adding the check to `setup.py` Line length limits are not enforced on the `setup.py` file because of the levels of indentation and long import paths. At the time of writing there are some reserved names, these are: * h * help * v * verbose * format 07070100000028000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000002E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/etc07070100000029000081A4000003E800000064000000015BE06E03000000D5000000000000000000000000000000000000004900000000python-swiftlm-8.0+git.1541434883.e0ebe69/etc/ardana_storage.conf.sample[swift_config] boot_label = boot disk_pattern = /dev/sd[a-z]\+$ file_system = xfs mount_dir = /srv/node/ file_prefix = /tmp/swift_drive_info_ net_interface = eth0 drive_list = [/dev/sdb, disk0], [/dev/sdc, disk1] 0707010000002A000081A4000003E800000064000000015BE06E0300000271000000000000000000000000000000000000003B00000000python-swiftlm-8.0+git.1541434883.e0ebe69/requirements.txt# The order of packages is significant, because pip processes them in the order # of appearance. Changing the order has an impact on the overall integration # process, which may cause wedges in the gate later. eventlet!=0.18.3,!=0.20.1,<0.21.0,>=0.18.2 # MIT setuptools!=24.0.0,!=34.0.0,!=34.0.1,!=34.0.2,!=34.0.3,!=34.1.0,!=34.1.1,!=34.2.0,!=34.3.0,!=34.3.1,!=34.3.2,!=36.2.0,>=16.0 # PSF/ZPL netifaces>=0.10.4 # MIT enum34;python_version=='2.7' or python_version=='2.6' or python_version=='3.3' # BSD python-swiftclient>=3.2.0 # Apache-2.0 python-keystoneclient>=3.8.0 # Apache-2.0 PyYAML>=3.10.0 # MIT psutil>=3.2.2 # BSD 0707010000002B000081A4000003E800000064000000015BE06E030000004D000000000000000000000000000000000000003400000000python-swiftlm-8.0+git.1541434883.e0ebe69/setup.cfg[build_sphinx] source-dir = doc/source build-dir = doc/build all_files = 1 0707010000002C000081A4000003E800000064000000015BE06E0300000B2A000000000000000000000000000000000000003300000000python-swiftlm-8.0+git.1541434883.e0ebe69/setup.py # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from setuptools import setup, find_packages from codecs import open from os import path here = path.abspath(path.dirname(__name__)) def requirements(): with open(here + '/requirements.txt', 'r') as f: return [y.strip() for y in f.readlines() if y.strip()] setup( name='swiftlm', version='3.1.1', description='Lifecycle management for Openstack Swift systems', author='SUSE LLC', author_email='ardana@googlegroups.com', url='https://github.com/ArdanaCLM', license='Apache 2.0', classifers=[ 'Development Status :: 3 - Alpha', 'License :: OSI Approved :: Apache 2.0', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', ], keywords='Openstack Swift', packages=find_packages(exclude=['docs', 'etc', 'tests']), install_requires=requirements(), entry_points={ 'console_scripts': [ 'swiftlm-scan = swiftlm.cli.runner:main', 'swiftlm-uptime-mon = swiftlm.cli.uptime_mon:main', 'swiftlm-ring-supervisor = swiftlm.cli.supervisor:main', 'swiftlm-drive-provision = swiftlm.cli.drive_provision:main', 'swiftlm-probe-100-continue = swiftlm.cli.probe_100_continue:main', 'swiftlm-monasca = swiftlm.cli.jahmoncli:main', 'swiftlm-scout = swiftlm.cli.scout:main', 'swiftlm-aggregate = swiftlm.cli.aggregate:main', 'swiftlm-memcached = swiftlm.cli.memcached:main', 'swiftlm-log-tailer = swiftlm.cli.log_tailer:main', ], 'swiftlm.plugins': [ 'check-mounts = swiftlm.systems.check_mounts:main', 'connectivity = swiftlm.systems.connectivity:main', 'system = swiftlm.systems.system:main', 'drive-audit = swiftlm.swift.drive_audit:main', 'file-ownership = swiftlm.swift.file_ownership:main', 'swift-services = swiftlm.swift.swift_services:main', 'replication = swiftlm.swift.replication:main', 'ssacli = swiftlm.hp_hardware.ssacli:main', ], }, include_package_data=True, zip_safe=False, ) 0707010000002D000041ED000003E8000000640000000A5BE06E0300000000000000000000000000000000000000000000003200000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm0707010000002E000081A4000003E800000064000000015BE06E030000002F000000000000000000000000000000000000003E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/__init__.pyCONFIG_FILE = '/etc/swiftlm/swiftlm-scan.conf' 0707010000002F000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003600000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli07070100000030000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004200000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/__init__.py07070100000031000081ED000003E800000064000000015BE06E03000034EE000000000000000000000000000000000000004300000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/aggregate.py# # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # """ Aggregate scouted data and produce metric file """ import json import optparse import socket import sys import time import yaml from swiftlm.utils.scout import SwiftlmScout from swiftlm.utils.utility import Aggregate, lock_file def process_collected(data, desired, dimensions, timestamp): """ Process collected data and derive aggregated metrics :param data: the data from scouting the nodes :param desired: a list of aggregations to perform :return: metrics """ metrics = [] if 'async' in desired: async(data, metrics, dimensions, timestamp) if 'diskusage' in desired: diskusage(data, metrics, dimensions, timestamp) if 'ringmd5' in desired: ringmd5(data, metrics, dimensions, timestamp) if 'load' in desired: load(data, metrics, dimensions, timestamp) if 'replication' in desired: replication(data, metrics, dimensions, timestamp) return metrics def async(data, metrics, dimensions, timestamp): agr = Aggregate() if not data.get('async'): return for name, item in data.get('async').items(): try: agr.add(item.get('async_pending')) except TypeError: pass metrics.append({'metric': 'swiftlm.async_pending.cp.total.queue_length', 'dimensions': dimensions, 'value': agr.total, 'value_meta': {}, 'timestamp': timestamp}) def diskusage(data, metrics, dimensions, timestamp): avail = Aggregate() used = Aggregate() size = Aggregate() usage = Aggregate() if not data.get('diskusage'): return for name, drives in data.get('diskusage').items(): try: for drive in drives: try: avail.add(int(drive.get('avail'))) used.add(int(drive.get('used'))) size.add(int(drive.get('size'))) usage.add(100.0 * float(drive.get('used')) / float(drive.get('size'))) except ValueError: pass # directory in /srv/node, not mounted FS except TypeError: pass metrics.append({'metric': 'swiftlm.diskusage.cp.total.avail', 'dimensions': dimensions, 'value': avail.total, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.diskusage.cp.total.used', 'dimensions': dimensions, 'value': used.total, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.diskusage.cp.total.size', 'dimensions': dimensions, 'value': size.total, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.diskusage.cp.avg.usage', 'dimensions': dimensions, 'value': usage.avg, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.diskusage.cp.min.usage', 'dimensions': dimensions, 'value': usage.min, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.diskusage.cp.max.usage', 'dimensions': dimensions, 'value': usage.max, 'value_meta': {}, 'timestamp': timestamp}) def replication(data, metrics, dimensions, timestamp): # Object data can be in either replication or replication/object # Object keys are prefixed with object_. The a/c are not. object_data = data.get('replication', data.get('replication/object', {})) account_data = data.get('replication/account', {}) container_data = data.get('replication/container', {}) for pre, styp, data in [(p, s, d) for (p, s, d) in [ ('object_', 'object', object_data), ('', 'account', account_data), ('', 'container', container_data)] if d]: last = Aggregate() duration = Aggregate() for name, item in data.items(): try: replication_last = item.get('%sreplication_last' % pre) ago = time.time() - replication_last last.add(ago) duration.add(item.get('%sreplication_time' % pre)) except TypeError: pass replication_dimensions = dict(dimensions) # make copy replication_dimensions.update({'component': '%s-replicator' % styp}) metrics.append({'metric': 'swiftlm.replication.cp.max.%s_last' % styp, 'dimensions': replication_dimensions, 'value': last.max, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.replication.cp.avg.' '%s_duration' % styp, 'dimensions': replication_dimensions, 'value': duration.avg, 'value_meta': {}, 'timestamp': timestamp}) def load(data, metrics, dimensions, timestamp): if not data.get('load'): return fivemin = Aggregate() for host, item in data.get('load').items(): try: fivemin.add(item.get('5m', 0)) except TypeError: pass metrics.append({'metric': 'swiftlm.load.cp.avg.five', 'dimensions': dimensions, 'value': fivemin.avg, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.load.cp.max.five', 'dimensions': dimensions, 'value': fivemin.max, 'value_meta': {}, 'timestamp': timestamp}) metrics.append({'metric': 'swiftlm.load.cp.min.five', 'dimensions': dimensions, 'value': fivemin.min, 'value_meta': {}, 'timestamp': timestamp}) def ringmd5(data, metrics, dimensions, timestamp): if not data.get('ringmd5'): return ringdata = {} hostdata = {} for host, rings in data.get('ringmd5').items(): try: for ringname, checksum in rings.items(): if not ringdata.get(ringname): ringdata[ringname] = set([checksum]) else: ringdata[ringname].add(checksum) if not hostdata.get(host): hostdata[host] = [ringname] else: hostdata[host].append(ringname) except TypeError: pass problem = False for ring, checksums in ringdata.items(): if len(checksums) > 1: # Checksums differ problem = True expected_number_rings = None for host, rings in hostdata.items(): if expected_number_rings is None: # Use first host as canonical expected_number_rings = len(rings) if expected_number_rings != len(rings): # This host has different number of rings than first host problem = True if problem: value = 2 msg = 'Checksum or number of rings not the same on all hosts' else: value = 0 msg = 'Rings are consistent on all hosts' metrics.append({'metric': 'swiftlm.md5sum.cp.check.ring_checksums', 'dimensions': dimensions, 'value': value, 'value_meta': {'msg': msg}, 'timestamp': timestamp}) def scout_main(): """ Gather data and create metrics output """ usage = ''' usage: {prog} [--metrics= | -] [--conf=] [--all] | [--async] [--diskusage] [--ringmd5] [--replication] [--load] [--timeout=] [--verbose] [--outformat= yaml | json] Examples: {prog} --all --metrics=/var/cache/swiftlm/aggregated.json cat /var/cache/swiftlm/aggregated.json | python -mjson.tool {prog} --async --outformat=yaml '''.format(prog='swiftlm-aggregate') args = optparse.OptionParser(usage) args.add_option('--metrics', metavar='METRICS_FILE', default=None, help='File to dump metrics into') args.add_option('--all', action='store_true', default=False, help='Aggregate everything') args.add_option('--async', action='store_true', default=False, help='Aggregate async') args.add_option('--diskusage', action='store_true', default=False, help='Aggregate disk usage') args.add_option('--ringmd5', action='store_true', default=False, help='Check ringmd5') args.add_option('--load', action='store_true', default=False, help='Aggregate 5m load average') args.add_option('--replication', action='store_true', default=False, help='Aggregate replication data') args.add_option('--outformat', type='string', metavar='FORMAT', help='Format of output.' ' Supported values are:' ' "yaml", "json" (default)', default='json') args.add_option('--timeout', type='int', metavar='SECONDS', help='Time to wait for a response from a server', default=5) args.add_option('--conf', default='/etc/swiftlm/scout.conf', help='Reserved for future use') args.add_option('--verbose', action='store_true', help='Print verbose info. Useful to troubleshoot.') args.add_option('--show_errors', action='store_true', default=False, help='Do not suppress errors') options, arguments = args.parse_args() aggregations = set() if options.all: aggregations = set(['async', 'diskusage', 'ringmd5', 'load', 'replication']) if options.async: aggregations.add('async') if options.diskusage: aggregations.add('diskusage') if options.ringmd5: aggregations.add('ringmd5') if options.load: aggregations.add('load') if options.replication: aggregations.add('replication') dimensions = {'service': 'object-storage', 'observer_host': socket.gethostname(), 'hostname': '_'} timestamp = time.time() suppress_errors = True if options.show_errors: suppress_errors = False if options.verbose: suppress_errors = False recon_data = SwiftlmScout({}, suppress_errors=suppress_errors, verbose=options.verbose, timeout=options.timeout) recon_data.scout_aggregate() collected = recon_data.get_results() metrics = process_collected(collected, aggregations, dimensions, timestamp) if options.outformat == 'json': items = [] for item in metrics: items.append(json.dumps(item)) dumped_metrics = '[\n' dumped_metrics += ',\n'.join(items) dumped_metrics += '\n]\n' elif options.outformat == 'yaml': dumped_metrics = yaml.safe_dump(metrics, allow_unicode=True, default_flow_style=False) else: print('Invalid value for --outformat') sys.exit(1) out_stream = sys.stdout if options.metrics: if options.metrics == '-': out_stream = sys.stdout out_stream.write(dumped_metrics) else: try: with lock_file(options.metrics, 2, unlink=False) as cf: cf.truncate() cf.write(dumped_metrics) except (Exception, Timeout) as err: print('ERROR: %s' % err) sys.exit(1) def main(): try: scout_main() except KeyboardInterrupt: print('\n') if __name__ == '__main__': main() 07070100000032000081A4000003E800000064000000015BE06E030000B2C0000000000000000000000000000000000000004900000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/drive_provision.py# # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # """ This is a python script that will discover the unpartitioned drives on a node, format them by partitioning and creating a filesystem for them. Both the partition and the filesystem will then be identically labelled in the format x"h"z where: x = 8 characters representing the ip address of the node in hex z = 3 digits representing the disk number The script will aslo allow the operator to chose individual options to perform to the disks with different parameters such as: --check -c --mount -m --label -l --partition -p --all -a """ import ConfigParser import json from optparse import OptionParser import os from random import randint import shutil import signal import socket import subprocess import sys import threading import time from swiftlm.utils.drivedata import Drive, LogicalVol, DISK_MOUNT, LVM_MOUNT, \ SwiftlmInvalidConfig from swiftlm.utils.utility import run_cmd # TODO - Will need to convert most if not all of the output messages to log # entries when logging is setup host_file = "/etc/hosts" conf_file = "/etc/swift/ardana_storage.conf" thread_timeout = 1800 supported_file_systems = ["xfs"] disk_model_file = "/etc/swift/disk_models.yml" class TimeoutError(Exception): pass # TODO - Will need to use config processor / Ardana input to # determine the IP or what interface to check. For now will just return the # IP that corresponds to the hostname def get_my_ip(hostname): """ Get node's IP address that corresponds to the hostname """ return socket.gethostbyname(socket.gethostname()) def three_dig_format(digit): """ Create string of 3 digits given any digit < 1000 """ if digit.isdigit() and len(digit) < 4: if len(digit) == 3: return digit elif len(digit) == 2: return "0" + digit elif len(digit) == 1: return "00" + digit def get_ip_label(ip): """ Format ip address of the node and convert into label The label will consist of hexadecimal values of the ip. Each octet will be converted into it's corresponding hex value and the whole label will be saved as an 8 character string. For example, 10.255.0.192 would have a label of 0AFF00C0 """ ip_label = "" for octet in ip.split("."): hex_val = hex(int(octet))[2:] if len(hex_val) == 1: hex_val = "0" + hex_val ip_label += hex_val return ip_label def get_drive_partitions(parted_d): """ Returns all partition of a given drives. Partition path is appended to end of each partition info """ status, output = run_cmd('/sbin/parted ' + parted_d + ' print') if status == 0: if "unrecognised disk label" in output: return [] else: partitions = output.split("Flags")[2].split("\n") else: print("Error reading %s - %s" % (parted_d, output)) sys.exit(1) # Remove blank entries for p in partitions: if not p: partitions.remove(p) # add partition path information in end of string # ['{p_str}'] --> ['{p_str} /dev/mapper/mpathb-part1 '] partition_paths = get_drive_partition_paths(parted_d) partitions = ["%s %s " % (x, y) for x, y in zip(partitions, partition_paths)] return partitions def is_lvm(name): """ Check if device is marked as an lvm """ if "lvm" in name: return True return False def is_partition(device): """ Check is a device is a partition """ if device[-1].isdigit(): return True return False def check_data_matches(cp_input_list): """ Verify config processor data and hardware match :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :return data_conflict: boolean value giving conflict status """ data_conflict = False for cp_device, swift_name in cp_input_list: # Check for lvm if is_lvm(swift_name): if not os.path.islink(cp_device): print("%s is targeted for swift use but volume is not found" % (cp_device)) data_conflict = True continue # Partition if last character is a digit elif is_partition(cp_device): # Asssume no more than 9 partitions in a drive partition = swift_name[-1] # SLES partition may have paths like /dev/mapper/mpathb-part1 if ("-part" in cp_device.lower() or "_part" in cp_device.lower()): drive = cp_device[0:-6] else: drive = cp_device[0:-1] else: drive = cp_device if os.path.exists(drive): # Need to check partitions if is_partition(cp_device): part_match = False partitions = get_drive_partitions(drive) for part in partitions: if part: part = ' '.join(part.split()) if part[0] == partition: part_match = True if not part_match: print("%s%s is targeted for swift use but device %s does " "not have partition %s" % (drive, partition, drive, partition)) data_conflict = True else: print("%s is targeted for swift use but device is not found" % (drive)) data_conflict = True return data_conflict def handler(signum, frame): """ Handler that is called if the parted command is > 10 seconds. This will then kill the parted command. """ pid_out = run_cmd('ps -A') for line in pid_out.output.splitlines(): if 'parted' in line: pid = int(line.split(None, 1)[0]) os.kill(pid, signal.SIGKILL) raise TimeoutError("Parted command is hanging for device") def find_parted_drives(drives): """ Find partitioned devices This goes through each device on he node. If there is no partition present it will be added to a list for later partitioning. Output is divided into partitioned and unpartitioned drives : param drives: list of all drives specified by the swift input model : return partitioned_drives: list of partitioned drives : return umpartitioned_drives: list of unpartitioned drives """ partitioned_drives = [] unpartitioned_drives = [] signal.signal(signal.SIGALRM, handler) for drive, swift_name in drives: # Ignore lvms if is_lvm(swift_name): continue status = -1 signal.alarm(20) try: status, output = run_cmd('/sbin/parted ' + drive + ' print') except TimeoutError as exc: print("%s %s - killing command" % (exc, drive)) if status == 0 and "Error" not in output: partitioned_drives.append(drive) elif status == -1: continue else: unpartitioned_drives.append(drive) # Disable the alarm signal.alarm(0) return partitioned_drives, unpartitioned_drives def get_disk_label_no(drive_partitions, ip_label): """ Returns the swift disk number of a partition that has already been labelled """ partitions = drive_partitions.split() num = "-1" for part in partitions: if (ip_label + "h") in part: _, num = part.strip("]'").split("h") return num def get_drive_partition_paths(parted_d): """ Returns all partition paths of a given drive """ # ~>lsblk -r -p -o name,type /dev/mapper/mpathb # ~>NAME TYPE # ~>/dev/mapper/mpathb mpath # ~>/dev/mapper/mpathb-part1 part output = os.popen('lsblk -r -p -o name,type ' + parted_d).read() p_paths = [] if output: if "not a block device" in output: return [] else: output_lines = output.split("\n") if output_lines and len(output_lines) > 1: for line in output_lines[1:]: line_cols = line.split() if line_cols and line_cols[1] == 'part': p_paths.append(line_cols[0]) else: print("Error reading partitioned drive %s - %s" % (parted_d, output)) sys.exit(1) return p_paths def separate_labelled_devices(ip_label, parted_drives, supported_file_systems, raw_drives, boot_label): """ Separate drives that are already labelled (including boot flags) Finds drives that are labelled correctly. It also keeps a record of the disk numbers that are labelled :param ip_label: hex representation of the ip address that is used for the swift device labels :param parted_drives: list of devices that are already partitioned :param supported_file_systems: filesystems that are supported :param raw_drives: drives that are not yet partitioned :param boot_lablel: label associated with any boot device :return unlabelled_drives: drives that have not been labelled yet :return raw_drives: drives that have not been formatted yet :return dsk_label_list: list of devices that have been labelled and their device number :return unlabelled_partitions: partitions that have not been labelled yet """ unlabelled_drives = [] dsk_label_list = [] unlabelled_partitions = [] for a_drive in parted_drives: drive_partitions = get_drive_partitions(a_drive) # Regular drive (one partition) if len(drive_partitions) == 1: drive_partition = str(drive_partitions) if ip_label not in drive_partition: raw = False for fs in supported_file_systems: if fs in drive_partition: unlabelled_drives.append(a_drive) raw = True if not raw: if boot_label not in drive_partitions: raw_drives.append(a_drive) # Keep a record of the disk numbers else: number = get_disk_label_no(drive_partition, ip_label) if number is not "-1": partition_path = drive_partition.split()[-2] dsk_label_list.append([str(int(number)), partition_path]) # Raw drive (no partitions) elif len(drive_partitions) == 0: raw_drives.append(a_drive) # Drive with multiple partitions else: for drive_part in drive_partitions: if boot_label not in drive_part: if ip_label not in drive_part: partition_path = drive_partition.split()[-2] unlabelled_partitions.append((a_drive, partition_path)) else: number = get_disk_label_no(drive_part, ip_label) if number is not "-1": partition_path = drive_partition.split()[-2] dsk_label_list.append([str(int(number)), partition_path]) return unlabelled_drives, raw_drives, dsk_label_list, unlabelled_partitions def create_volume_fs(cp_input_list, disk_label_list, iplabel): """ Create xfs filesystems for any volumes that don't have one. Also, add labelled volumes to list As well as adding the labelled volumes to the disk label list this will return a list of unlabelled volumes. Furthermore, any unlabelled volume that doesn't have an xfs filesystem will be give one. :param cp_input_list: list of tuples that show the devices selected by the configuration (drive/part/lvm name, swift device name) :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :param iplabel: hex representation of the ip address that is used for the swift device labels :return blank_volumes: volumes that have yet to be labelled :return disk_label_list: list of devices that have been labelled and their device number """ blank_volumes = [] for full_vol, swift_name in cp_input_list: # Only interested in logical volumes if "lvm" in swift_name: lvm_l_stat, lvm_l_out = run_cmd("/usr/sbin/xfs_admin -l " + full_vol) # Create an xfs filesystem if there is not already one present if lvm_l_stat != 0: # Make sure that the volume is not being used already # 1) Check if it has another filesystem ext_status, _ = run_cmd("/sbin/e2label" + full_vol) if ext_status == 0: print("Cannot proceed - %s has a non-compatible swift fs" % (full_vol)) sys.exit(1) # 2) Check if already mounted. * Note - os.path.ismount() # can't be used as cannot be certain of mount point at this # stage _, mount_status = run_cmd("df") logical_vol = full_vol.split("/")[-1] if logical_vol in mount_status: print("Cannot proceed - %s is already mounted" % (full_vol)) sys.exit(1) lvm_xfs_stat, lvm_xfs_out = run_cmd( "/sbin/mkfs.xfs -f" + " " + full_vol) if lvm_xfs_stat != 0: print("Error: Failed to create filesystem for volume %s" " - %s" % (full_vol, lvm_xfs_out)) sys.exit(1) blank_volumes.append(full_vol) # Otherwise check if labelled correctly else: if (iplabel + "v") in lvm_l_out: nop, number = lvm_l_out.strip('"').split("v") disk_label_list.append(["v" + str(int(number)), full_vol]) else: blank_volumes.append(full_vol) return blank_volumes, disk_label_list def format_drives(blank_drives, unparted_drives, fs, cp_input_list): """ Partition all raw drives As well as partitioning and adding a fs to a drive, the function will also add them to the unlabelled drives list :param blank_drives: list of partitioned drives that do not have a swift label :param unparted_drives: list of drives that need to be partitioned :param fs: type of filesystem to create :parama cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :return blank_drives: see above """ for unparted in unparted_drives: if cp_input_list: # Make sure that the drive is marked for swift use by the config # processor config_match = False for full_vol, swift_name in cp_input_list: if unparted == full_vol: config_match = True if not config_match: print("%s has not been selected for swift use" % (unparted)) continue # Partition the drive: create_status, create_output = run_cmd( "/sbin/parted -s " + unparted + " mklabel gpt") if create_status != 0: print("Error: Failed to create partition table for disk %s - %s" % (unparted, create_output)) sys.exit(1) partition_status, partition_output = run_cmd( "/sbin/parted -s -- " + unparted + " mkpart primary 1 -1") if partition_status != 0: print("Error: Failed to create partition table for disk %s - %s" % (unparted, partition_output)) sys.exit(1) blockdev_retry_max = 5 blockdev_retry_count = 0 while blockdev_retry_count < blockdev_retry_max: blockdev_status, blockdev_output = run_cmd( "/sbin/partprobe %s" % unparted) if blockdev_status != 0: blockdev_retry_count += 1 time.sleep(blockdev_retry_count) else: break if blockdev_retry_max == blockdev_retry_count: print("Error: Failed to reread the disks partition table %s - %s" % (unparted, blockdev_output)) sys.exit(1) # Need to introduce a delay between creating the partition and the # file system time.sleep(1) # Only supports xfs for now if fs == "xfs": partition_paths = get_drive_partition_paths(unparted) for p_path in partition_paths: filesys_status, filesys_output = run_cmd( "/sbin/mkfs.xfs -f " + p_path) if filesys_status != 0: print("Error: Failed to create filesystem for disk %s - %s" % (unparted, filesys_output)) sys.exit(1) blank_drives.append(unparted) return blank_drives def confirm_config(device, cp_input_list): """ Confirm that the drive is marked for swift use by the config processor :param device: the device that is being checked :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :return config_match: Boolen to tell if the device is in the confuration processor :return config_disk_val: Device number that is suggested for label use """ config_match = False config_disk_val = -1 if cp_input_list: for full_vol, swift_name in cp_input_list: if device == full_vol: config_match = True # Remove "disk" config_disk_val = swift_name[4:] return config_match, config_disk_val def get_full_vol_label(vol_val, disk_label_list, cp_input_list, ip_label): """ Returns the full fs label for a swift logical volume :param vol_val: swift volume number suggested by the config processor :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :param ip_label: hex representation of the ip address that is used for the swift device labels :return full_vol_label: :return: actual volume number (v + digit) """ vol_count = 0 # Create an array with just the label number for volumes temp_vol_labels = [] for disk_label in disk_label_list: # Only include logical volumes which are highlighted by a "v" in # the swift device number if "v" in disk_label[0]: temp_vol_labels.append(int(disk_label[0][1:])) vol_number_found = False # If the volume number is already in use, select the next available value if cp_input_list: if int(vol_val) not in temp_vol_labels: vol_count = vol_val vol_number_found = True if not vol_number_found: vol_lab_match = True while vol_lab_match: if vol_count in temp_vol_labels: vol_count = vol_count + 1 else: vol_lab_match = False full_vol_label = ip_label + "v" + three_dig_format(str(vol_count)) return full_vol_label, "v" + str(vol_count) def get_full_label(disk_val, disk_label_list, cp_input_list, ip_label): """ Returns the full fs and partition label for a swift "disk" :param disk_val: swift device number suggested by the config processor :param disk_label_list: list of devices that are labelled and their swift number ("v" is with the swift number if the device is an lvm) :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :param ip_label: hex representation of the ip address that is used for the swift device labels :return full_label: full label to be used on a partiton/file system :return disk_count: three-digit format of the device number """ disk_count = 0 # Create an array with just the label number for disks/partitions temp_labels = [] for disk_label in disk_label_list: # Don't include logical volumes if "v" not in disk_label[0]: temp_labels.append(int(disk_label[0])) disk_number_found = False # If the disk number is already in use, select the next available value if cp_input_list: if int(disk_val) not in temp_labels: disk_count = disk_val disk_number_found = True if not disk_number_found: lab_match = True while lab_match: if disk_count in temp_labels: disk_count = disk_count + 1 else: lab_match = False full_label = ip_label + "h" + three_dig_format(str(disk_count)) return full_label, disk_count def label_volumes(blank_volumes, ip_label, disk_label_list, cp_input_list): """ Label the filesystem of a logical volume :param blank_volumes: volumes that have not been labelled yet :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :param ip_label: hex representation of the ip address that is used for the swift device labels :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :return disk_label_list: see above """ for blank_vol in blank_volumes: # No need to confirm that volume is marked for swift use as only # volumes in the config processor are included from the start swift_vol_match = False for full_vol, swift_name in cp_input_list: if blank_vol == full_vol: # Remove "lvm" config_vol_val = swift_name[3:] swift_vol_match = True if not swift_vol_match: print("%s has not been selected for swift use" % (blank_vol)) continue full_label, volume_number = get_full_vol_label(config_vol_val, disk_label_list, cp_input_list, ip_label) vol_label_status, vol_label_output = run_cmd( '/usr/sbin/xfs_admin -L "' + full_label + '" ' + blank_vol) if vol_label_status != 0: print("Error labelling xfs volume %s - %s" % (blank_vol, vol_label_output)) sys.exit(1) disk_label_list.append([str(volume_number), blank_vol]) return disk_label_list def label_drives(blank_drives, ip_label, fs, disk_label_list, cp_input_list): """ Label the partition and filesystem of a drive :param blank_drives: drives that have not been labelled yet :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :param ip_label: hex representation of the ip address that is used for the swift device labels :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :return disk_label_list: see above """ for blank in blank_drives: config_match, config_disk_val = confirm_config(blank, cp_input_list) if not config_match: print("%s has not been selected for swift use" % (blank)) continue full_label, disk_number = get_full_label(config_disk_val, disk_label_list, cp_input_list, ip_label) p_lab_status, p_lab_output = run_cmd( '/sbin/parted -s ' + blank + ' name 1 "' + full_label + '"') if p_lab_status != 0: print("Error labelling partition %s - %s" % (blank, p_lab_output)) sys.exit(1) time.sleep(1) partition_paths = get_drive_partition_paths(blank) for p_path in partition_paths: xfs_label_status, xfs_label_output = run_cmd( '/usr/sbin/xfs_admin -L "' + full_label + '" ' + p_path) if xfs_label_status != 0: print("Error labelling xfs partition %s - %s" % (p_path, xfs_label_output)) # Revert the partition label back os.system('/sbin/parted -s ' + blank + ' name 1 "primary"') sys.exit(1) disk_label_list.append([str(disk_number), p_path]) return disk_label_list def label_partitions(blank_partitions, ip_label, fs, disk_label_list, cp_input_list): """ Label particular partitions and filesystems in a drive :param blank_partitions: partitions that have not been labelled yet :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :param ip_label: hex representation of the ip address that is used for the swift device labels :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) :return disk_label_list: see above """ for drive_path, blank_p in blank_partitions: config_match, config_disk_no = confirm_config(blank_p, cp_input_list) if not config_match: print("%s has not been selected for swift use" % (blank_p)) continue full_label, disk_number = get_full_label(config_disk_no, disk_label_list, cp_input_list, ip_label) p_lab_status, p_lab_output = run_cmd( '/sbin/parted -s ' + drive_path + ' name ' + blank_p[-1] + ' "' + full_label + '"') if p_lab_status != 0: print("Error labelling partition %s - %s" % (blank_p, p_lab_output)) sys.exit(1) time.sleep(1) # Make sure that there is a filesystem to label part_status, part_output = run_cmd('/sbin/parted -s ' + blank_p + ' p') # Add the fs to the partition if it isn't there already if fs not in part_output and fs == "xfs": create_fs_stat, create_fs_out = run_cmd( '/sbin/mkfs.xfs -f ' + blank_p) if create_fs_stat != 0: print("Error: Failed to create filesystem for partition %s " "- %s" % (blank_p, create_fs_out)) sys.exit(1) fs_label_status, fs_label_output = run_cmd( '/usr/sbin/xfs_admin -L "' + full_label + '" ' + blank_p) if fs_label_status != 0: print("Error labelling xfs partition %s - %s" % (blank_p, fs_label_output)) # Revert the partition label back os.system('/sbin/parted -s ' + blank_p + ' name ' + blank_p[-1] + ' "primary"') sys.exit(1) disk_label_list.append([str(disk_number), blank_p]) return disk_label_list def mount_by_label(mount_point, mount_label, return_val): """ Mount a device by it's label :param mount_point: directory that device will be mounted to :param mount_label: device's label :param return_val: list of device mounting results """ command = "/bin/mount -o noatime,nodiratime,nobarrier,logbufs=8 -L " + \ mount_label + " " + mount_point child = subprocess.Popen(command, shell=True, stderr=subprocess.PIPE) (outtext, errtext) = child.communicate() rc = child.returncode if rc != 0: return_val.append(command + ": status: " + str(rc) + " (ERROR) - " + errtext + "\n") else: return_val.append(command + ": status: " + str(rc) + " (SUCCESS)\n") def mount_devices(ip_label, disk_label_list, mount_dir, cp_input_list): """ Function to mount all labelled devices. Returns a list of mounted drives :param ip_label: hex representation of the ip address that is used for the swift device labels :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :param mount_dir: directory where the devices will be mounted to :param cp_input_list: list of tuples that show the devices selected by the configuration(drive/part/lvm name, swift device name) """ mounts_to_check = [] return_results = [] mount_threads = [] if not os.path.isdir(mount_dir): cmd_status, cmd_output = run_cmd("/bin/mkdir " + mount_dir) if cmd_status != 0: print("Error creating mount directory %s" % (mount_dir)) sys.exit(1) for label_no in disk_label_list: # Logical volumes if "v" in label_no[0]: if cp_input_list: mount_vol_match = False for full_vol, swift_name in cp_input_list: if ("lvm" + label_no[0][1:]) == swift_name: # Remove "v" from label_no val when creating mount # point mount_point = os.path.join(mount_dir, "lvm" + label_no[0][1:]) mount_vol_match = True if not mount_vol_match: print("Not mounting lvm%s - it isn't marked for swift use" % (label_no[0][1:])) continue else: mount_point = os.path.join(mount_dir, "lvm" + label_no[0][1:]) else: if cp_input_list: mount_match = False for full_vol, swift_name in cp_input_list: if ("disk" + label_no[0]) == swift_name: mount_point = mount_dir + "disk" + label_no[0] mount_match = True if not mount_match: print("Not mounting disk%s - it isn't marked for swift use" % (label_no[0])) continue else: mount_point = mount_dir + "disk" + label_no[0] # Create mount dir if it doesn't exist if not os.path.isdir(mount_point): cmp_status, cmp_output = run_cmd("/bin/mkdir " + mount_point) if cmp_status != 0: print("Error creating mount point %s - %s" % (mount_point, cmp_output)) sys.exit(1) # Make sure that the directories are owned by root:root chown_status, chown_out = run_cmd("/bin/chown root:root " + mount_point) if chown_status != 0: print("Error changing ownership of %s - %s" % (mount_point, chown_out)) sys.exit(1) if "v" in label_no[0]: mount_label = ip_label + "v" + three_dig_format(label_no[0][1:]) else: mount_label = ip_label + "h" + three_dig_format(label_no[0]) # Check if already mounted if os.path.ismount(mount_point): print("%s is already mounted" % (mount_point)) else: t = threading.Thread(target=mount_by_label, args=(mount_point, mount_label, return_results)) t.start() mount_threads.append(t) mounts_to_check.append([mount_label, mount_point]) for mt in mount_threads: mt.join(thread_timeout) # TODO - this will soon be logged instead of printed out if return_results: print("%s" % ("Running: ".join(return_results))) # Confirm that the drives are mounted for drives in mounts_to_check: if os.path.isdir(drives[1]): # Double-check the directories are owned by swift:swift # NOTE - there is a corner-case where the thread_timeout value # expires and the ownership will not be changed here (the mount # thread may complete at a later time). In this case, it is # necessary that the diags pick up bad ownership asap chwn_status, chwn_out = run_cmd("/bin/chown swift:swift " + drives[1]) if chwn_status != 0: print("Error changing ownership of %s - %s" % (drives[1], chwn_out)) sys.exit(1) print("Mounted to %s with label %s" % (drives[1], drives[0])) def get_product_name(): ''' Returns the type of node (product) ''' prod_stat, prod_out = run_cmd("dmidecode -s system-product-name") if prod_stat != 0: print("Error getting node product name - %s" % (prod_out)) sys.exit(1) else: return prod_out def generate_drive_info(my_ip, mount_dir, disk_label_list): ''' Gather all the partitioned and labelled drive data on the node into a single array of element. :param my_ip: ip address of the node :param mount_dir: directory where the swift devices were mounted to :param disk_label_list: list of devices that are labelled and their swift device number ("v" is with the swift number if the device is an lvm) :return node_info: array with all the device information of the node ''' server_type = get_product_name() node_info = {"hostname": socket.getfqdn(), "ip_addr": my_ip, "model": server_type} drive_info = [] for lab in disk_label_list: # Get the size of the drive size_status, size_output = run_cmd("/sbin/blockdev --getsize64 " + lab[1]) if size_status == 0: size = ((float(size_output) / 1024) / 1024) / 1024 else: print("Error getting size of %s - %s" % (lab[1], size_output)) sys.exit(1) # Check if drive is mounted if "v" in lab[0]: mount_to_check = os.path.join(mount_dir, LVM_MOUNT + lab[0][1:]) else: mount_to_check = os.path.join(mount_dir, DISK_MOUNT + lab[0]) if os.path.ismount(mount_to_check): is_mounted = True else: is_mounted = False if "v" in lab[0]: drive_info.append({"name": lab[1], "swift_drive_name": "lvm" + lab[0][1:], "size_gb": size, "mounted": str(is_mounted)}) else: drive_info.append({"name": lab[1], "swift_drive_name": "disk" + lab[0], "size_gb": size, "mounted": str(is_mounted)}) node_info["devices"] = drive_info return node_info def write_to_info_file(node_data, fact_file): ''' Create json file with drive info File that captures all the partitioned and labeled drive data on the node and writes it to a json file. The data is presented in the following format: { "model": "", "hostname": "", "ip_addr": "", "devices": [ { "size_gb": "device_size_in_gb", "mounted": "", "name": "/dev/sd", "swift_drive_name": "disk" }, ... ] } ''' # Add a randon integer in case there is corruption due to the script being # run concurrently temp_file = "/tmp/tmp_swift_data" + str(randint(0, 999)) + ".txt" file_to_write = open(temp_file, "w") file_to_write.write(json.dumps(node_data, indent=4, separators=(',', ': '))) file_to_write.close() # Check of the directory of the destination file exists: if not os.path.isdir(os.path.dirname(fact_file)): print("Cannot write fact file - %s is not a directroy" % (os.path.dirname(fact_file))) sys.exit(1) shutil.move(temp_file, fact_file) if not os.path.isfile(fact_file): print("Error writing file %s" % (fact_file)) sys.exit(1) def main(): # Make sure that the user is root if not os.geteuid() == 0: print("Script must be run as root") sys.exit(1) args = OptionParser() args.add_option("-v", "--verbose", dest="verbose", action="store_true", help="Give a more verbose output") args.add_option("-a", "--all_actions", dest="all_actions", action="store_true", help="Perform all actions on the the drives") args.add_option("-p", "--partition", dest="partition", action="store_true", help="Only partition the drives") args.add_option("-l", "--label", dest="label", action="store_true", help="Only label the drives") args.add_option("-m", "--mount", dest="mount", action="store_true", help="Only mount the drives") args.add_option("-c", "--check", dest="check", action="store_true", help="Check config data and hardware match") options, arguments = args.parse_args() # Get confuration data from /etc/swift/ardana_storage.conf if os.path.isfile(conf_file): if options.verbose: print("Using config data from %s" % (conf_file)) parser = ConfigParser.RawConfigParser() parser.read(conf_file) try: boot_label = parser.get("swift_config", "boot_label") except ConfigParser.NoOptionError: boot_label = "boot" try: fs = parser.get("swift_config", "file_system") except ConfigParser.NoOptionError: fs = "xfs" try: mount_dir = parser.get("swift_config", "mount_dir") except ConfigParser.NoOptionError: mount_dir = "/srv/node/" try: fact_file = parser.get("swift_config", "fact_file") except ConfigParser.NoOptionError: fact_file = "/etc/ansible/facts.d/swift_drive_info.fact" else: boot_label = "boot" fs = "xfs" mount_dir = "/srv/node/" fact_file = "/etc/ansible/facts.d/swift_drive_info.fact" if options.verbose: print("No %s - using default values" % (conf_file)) print("*****************************") print("Boot Label = %s" % (boot_label)) print("File System = %s" % (fs)) print("Mount Directory = %s" % (mount_dir)) print("File Prefix = %s" % (fact_file)) print("Not using config processor drive entries") print("*****************************") # Get disk info from the disk_models.yml if os.path.isfile(disk_model_file): cp_input_list = [] try: swift_drives = Drive.load(disk_model_file) except SwiftlmInvalidConfig as exc: print("ERROR: %s" % (exc)) sys.exit(1) for e_drive in swift_drives: cp_input_list.append((e_drive.device, e_drive.swift_device_name)) try: swift_log_vols = LogicalVol.load(disk_model_file) except SwiftlmInvalidConfig as exc: print("ERROR: %s" % (exc)) sys.exit(1) for e_vol in swift_log_vols: full_vol = "/dev/" + str(e_vol.lvg) + "/" + e_vol.lvm cp_input_list.append((full_vol, e_vol.swift_lvm_name)) else: print("Cannot continue - %s not present" % (disk_model_file)) sys.exit(1) # No need to continue if there are no swift drives/lvms if not cp_input_list: print("No swift devicess specified in input model for this node") sys.exit(0) mounted = [] my_hostname = socket.getfqdn() my_ip = get_my_ip(my_hostname) ip_label = get_ip_label(my_ip) if options.verbose: print("IP hex label = %s" % (ip_label)) # Can skip validation if ONLY mounting devices. Note that the validation # will still be performed if the "all_actions" option is selected, which # includes mounting the devives if not options.mount: data_conflict = check_data_matches(cp_input_list) if data_conflict: sys.exit(1) if options.check: sys.exit(0) parted_drives, raw_drives = find_parted_drives(cp_input_list) blank_drives, unparted_drives, disk_label_list, blank_partitions = \ separate_labelled_devices(ip_label, parted_drives, supported_file_systems, raw_drives, boot_label) blank_volumes, disk_label_list = create_volume_fs(cp_input_list, disk_label_list, ip_label) if options.verbose: print("Current disk label list = %s" % (str(disk_label_list))) print("Unpartitioned disk list = %s" % (str(unparted_drives))) print("Unlabelled disk list = %s" % (str(blank_drives))) print("Unlabelled partition list = %s" % (str(blank_partitions))) print("Unlabelled volume list = %s" % (str(blank_volumes))) if options.all_actions or options.partition: if not unparted_drives: print("No drives that need to be partitioned") else: print("The following drives are not partitioned - %s" % (str(unparted_drives))) blank_drives = format_drives(blank_drives, unparted_drives, fs, cp_input_list) if options.all_actions or options.label: if not blank_drives: print("No one-partition drives that need to be labelled") else: print("Label the following drives - %s" % (str(blank_drives))) disk_label_list = label_drives(blank_drives, ip_label, fs, disk_label_list, cp_input_list) if not blank_partitions: print("No multiple partition drives that need to be labelled") else: print("Label the following partitions - %s" % (str(blank_partitions))) disk_label_list = label_partitions(blank_partitions, ip_label, fs, disk_label_list, cp_input_list) if not blank_volumes: print("No volumes that need to be labelled") else: print("Label the following volumes - %s" % (str(blank_volumes))) disk_label_list = label_volumes(blank_volumes, ip_label, disk_label_list, cp_input_list) if options.all_actions or options.mount: if not disk_label_list: print("No drives ready to be mounted") else: mount_devices(ip_label, disk_label_list, mount_dir, cp_input_list) node_drive_info = generate_drive_info(my_ip, mount_dir, disk_label_list) write_to_info_file(node_drive_info, fact_file) if __name__ == '__main__': main() 07070100000033000081ED000003E800000064000000015BE06E0300004A6C000000000000000000000000000000000000004300000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/jahmoncli.py#!/usr/bin/python # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import sys from os import environ from time import sleep import datetime from optparse import OptionParser import json from swiftlm.utils.jahmonapi import JahmonConnection, JahmonClientException verbose = False def print_verbose(msg): global verbose if verbose: print(msg) def print_result(result, fmt='compact'): name = result.get('name') dimensions = result.get('dimensions', {}) timestamp = result.get('timestamp', '') value = result.get('value', '') meta = result.get('value_meta', {}) if fmt == 'compact': dimvals = [] for key, val in dimensions.items(): dimvals.append(val) valuemeta = [] for key, val in meta.items(): valuemeta.append(val) dimmsg = ','.join(dimvals) valmsg = ','.join(valuemeta) print('%s %s %s {%s} {%s}' % (timestamp, name, value, dimmsg, valmsg)) elif fmt == 'flat': print('%s %s %s %s %s' % (timestamp, name, value, json.dumps(dimensions), json.dumps(meta))) elif fmt == 'json': out = {'name': name, 'timestamp': timestamp, 'value': value, 'dimensions': dimensions, 'value_meta': meta} print('%s' % json.dumps(out)) def main(): global verbose usage = ''' Usage: Set environment variables export OS_USERNAME=name export OS_PASSWORD=secret export OS_AUTH_URL=https://region-a.geo-1.identity.hpcloudsvc.com/v2.0 export OS_REGION_NAME=region-a.geo-1 export OS_PROJECT_ID=12345678912345 OR unset OS_AUTH_URL {name} --url=http://192.168.245.9:8070/v2.0 --token=HPAuth1234 Convenience Functions List metrics -- one metric per line. {name} metrics [--metric_name=] [--dim=:]... List metrics/measurements. This first gets all metrics matching the name and dimensions and then for each metric, gets the measurements. Unlike the merge_metrics feature of the Monasca API, this lists the metric associated with each measurement. However, the data may not be sorted by time. The output format is specified with --format. Note: the JSON format outputs JSON on each line -- the output as a whole is not JSON. The data is not necessarily in timestamp order. Even {name} find [--metric_name=] [--dim=:]... --start_time= - | yyyy:mm:ddThh:mm.000Z --end_time= 0 | -1 | -2 | yyyy:mm:ddThh:mm.000Z [--format compact | flat | json] [--merge_metrics] [--count=] List merged metrics/measurements. {name} merge [--metric_name=] [--dim=:]... --start_time= - | yyyy:mm:ddThh:mm.000Z --end_time= 0 | -1 | -2 | yyyy:mm:ddThh:mm.000Z [--format= compact | flat | json] [--count=] Tail metrics/measurements. This is similar to find except that it runs continuously. The timestamps may not be in order. NOTE: it will not show metrics posted less than two minutes ago. {name} tail [--metric_name=] [--dim=:]... --format compact | flat | json Aggregate measurements. This prints the average, min, max, total and count of a given metric. {name} aggregate -metric_name= [--dim=:]... Monasca API Wrappers These are thin wrappers for the corresponding API function. Get versions {name} versions Post a metric/measurement: POST /metrics API {name} post_metric --metric_name= --value=123.45 [--dim=:]... [--value_meta=:]... List metrics: GET /metrics API {name} metrics_api [--metric_name=] [--dim=:]... Get measurements. GET /metrics/measurements {name} meas --metric_name= [--dim=:]... --start_time= - | yyyy:mm:ddThh:mm.000Z --end_time= 0 | - | yyyy:mm:ddThh:mm.000Z --offset=yyyy:mm:ddThh:mm.000Z [--merge_metrics] Examples: {name} post_metric --metric_name=disk_read_bytes_count --dim=az:2 --dim=instance_id:2741581 --value=48523.0 --value_meta=msg:hi_there {name} find --dim=service:object-storage --metric_name=swiftlm.swift.swift_services --start_time=-2 --end_time=0 --format=compact {name} tail --dim=service:object-storage --dim=hostname:standard-ccp-c1-m1-mgmt {name} aggregate --metric_name=swiftlm.avg_latency_sec --dim=component:rest-api --start_time=-100000 --end_time=0 '''.format(name='swiftlm-monasca') https_proxy = environ.get('https_proxy', None) if https_proxy: print 'I suggest you unset https_proxy and try again.' sys.exit(1) parser = OptionParser(usage=usage) parser.add_option('--metric_name', dest='metric_name', help='The metric name') parser.add_option('--value', dest='value', help='Value (as integer/float)') parser.add_option('--dim', dest='dim_list', action='append', default=None, help='A dimension. Format as name:value.' ' For multiple dimensions, repeat --dim') parser.add_option('--value_meta', dest='vm_list', action='append', default=None, help='Value meta. Format name:value') parser.add_option('--start_time', dest='start_time', default=None, help='Start time. Express as negative minutes ago' ' or as UTC. Examples: -2,' ' 2015-11-24T10:42:29.000Z') parser.add_option('--end_time', dest='end_time', default='0', help='End time. Same format as --start-time except also' ' accepts "0" to mean "now"') parser.add_option('--offset', dest='offset', default=None, help='Offset as used in Monasca a API. Not needed by' ' the convenience functions') parser.add_option('--merge_metrics', dest='merge_metrics', default=False, action='store_true', help='As used by Monasca API') parser.add_option('--count', dest='count', default=None, help='Number of measurements to get (per metric)') parser.add_option('--format', dest='fmt', default='flat', help='Output format (some functions only). Options' ' are "compact", "flat" or "json". Default is' ' "flat". ') parser.add_option('--url', dest='jahmon_url', default=None, help='Monasca endpoint (use with --token).') parser.add_option('--token', dest='jahmon_token', default=None, help='A token (used with --jahmon_url)') parser.add_option('--verbose', dest='verbose', action='store_true', default=False) (options, args) = parser.parse_args() os_username = environ.get('OS_USERNAME', None) os_password = environ.get('OS_PASSWORD', None) os_region_name = environ.get('OS_REGION_NAME', None) os_auth_url = environ.get('OS_AUTH_URL', None) os_project_id = environ.get('OS_TENANT_ID', environ.get('OS_PROJECT_ID', None)) os_project_name = environ.get('OS_TENANT_NAME', environ.get('OS_PROJECT_NAME', None)) os_user_id = environ.get('OS_USER_ID', None) os_user_domain_name = environ.get('OS_USER_DOMAIN_NAME', None) os_user_domain_id = environ.get('OS_USER_DOMAIN_ID', None) os_project_domain_name = environ.get('OS_PROJECT_DOMAIN_NAME', None) os_project_domain_id = environ.get('OS_PROJECT_DOMAIN_ID', None) if options.jahmon_url: os_username = os_password = os_region_name = os_auth_url = None os_project_id = None if not len(args) == 1: print 'Invalid number of arguments' sys.exit(1) if not options.dim_list: options.dim_list = [] dimensions = {} for dim in options.dim_list: name, val = dim.split(':', 1) dimensions[name] = val if not options.vm_list: options.vm_list = [] value_meta = {} for vm in options.vm_list: name, val = vm.split(':', 1) value_meta[name] = val if options.start_time: if options.start_time == '0': pass elif options.start_time.startswith('-'): try: float(options.start_time) except ValueError: print('Invalid --start_time') sys.exit(1) else: try: datetime.datetime.strptime(options.start_time, '%Y-%m-%dT%H:%M:%S.%fZ') except ValueError: print('Invalid --start_time') sys.exit(1) if options.end_time: if options.end_time == '0': pass elif options.end_time.startswith('-'): try: float(options.end_time) except ValueError: print('Invalid --end_time') sys.exit(1) else: try: datetime.datetime.strptime(options.end_time, '%Y-%m-%dT%H:%M:%S.%fZ') except ValueError: print('Invalid --end_time') sys.exit(1) verbose = options.verbose measurement_count = None if options.count: measurement_count = int(options.count) try: print_verbose('Setup...') conn = JahmonConnection(auth_url=os_auth_url, jahmon_url=options.jahmon_url, jahmon_token=options.jahmon_token, username=os_username, password=os_password, project_id=os_project_id, project_name=os_project_name, region_name=os_region_name, user_domain_name=os_user_domain_name, user_domain_id=os_user_domain_id, project_domain_name=os_project_domain_name, project_domain_id=os_project_domain_id, user_id=os_user_id) except JahmonClientException as err: print('...failed; Got %s code; reason: %s' % (err.http_status, err)) sys.exit(1) try: print_verbose('Authenticating...') conn.get_versions() except JahmonClientException as err: print '...failed; Got %s code; reason: %s' % (err.http_status, err) sys.exit(1) if os_auth_url: print_verbose('...Monasca endpoint: %s' % conn.url) print_verbose('...token: %s' % conn.token) print_verbose('...after %s attempts' % conn.attempts) else: print_verbose('...using %s with token %s' % (conn.url, conn.token)) if args[0].lower().startswith('post_metric'): try: print_verbose('POST /metric...') conn.post_metric(options.metric_name, options.value, dimensions=dimensions, value_meta=value_meta, timestamp=None, for_project=None) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) print_verbose('...status: %s' % conn.status_code) print_verbose('...after %s attempts' % conn.attempts) elif args[0].lower().startswith('versions'): try: print_verbose('GET /...') reply = conn.get_versions() print json.dumps(reply, indent=2, separators=(',', ': ')) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) print_verbose('...status: %s' % conn.status_code) print_verbose('...after %s attempts' % conn.attempts) elif args[0].lower().startswith('metrics_api'): try: print_verbose('GET /metrics...') reply = conn.get_metrics_api(options.metric_name, dimensions=dimensions, for_project=None) print(json.dumps(reply, indent=2, separators=(',', ': '))) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) print_verbose('...status: %s' % conn.status_code) print_verbose('...after %s attempts' % conn.attempts) elif args[0].lower().startswith('metrics'): try: metrics = conn.get_metrics(options.metric_name, dimensions=dimensions, for_project=None) for metric in metrics: print('%s %s' % (metric.get('name'), json.dumps(metric.get('dimensions')))) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) elif args[0].lower().startswith('meas'): try: print_verbose('GET /metrics/measurements...') merge_metrics = options.merge_metrics reply = conn.get_measurements_api(options.metric_name, options.start_time, options.end_time, dimensions=dimensions, for_project=None, merge_metrics=merge_metrics, offset=options.offset) print(json.dumps(reply, indent=2, separators=(',', ': '))) except Exception as err: print('...failed; reason: %s' % err) print_verbose('...status: %s' % conn.status_code) print_verbose('...after %s attempts' % conn.attempts) elif args[0].lower().startswith('find'): try: metrics = conn.get_metrics(options.metric_name, dimensions=dimensions) for metric in metrics: items = conn.get_measurements(metric, options.start_time, options.end_time, for_project=None, count=measurement_count) for item in items: print_result(item, options.fmt) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) elif args[0].lower().startswith('merge'): try: metric = {'name': options.metric_name, 'dimensions': dimensions} items = conn.get_measurements(metric, options.start_time, options.end_time, for_project=None, merge_metrics=True, count=measurement_count) for item in items: print_result(item, options.fmt) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) elif args[0].lower().startswith('aggregate'): try: min = None max = 0.0 total = 0.0 count = 0 metrics = conn.get_metrics(options.metric_name, dimensions=dimensions) for metric in metrics: items = conn.get_measurements(metric, options.start_time, options.end_time, for_project=None, merge_metrics=True, count=measurement_count) for item in items: count += 1 value = item.get('value') total += value if min is None: min = value if value < min: min = value if value > max: max = value avg = float(total) / count print('avg: %s min-max: %s-%s total: %s count: %s' % (avg, min, max, total, count)) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) elif args[0].lower().startswith('tail'): try: metrics = [] items = conn.get_metrics(options.metric_name, dimensions=dimensions) for item in items: metrics.append(item) start_time = conn.utctime(-4) # First time through, show more while True: # A measurement may be posted well after it's timestamp # Hence the tailed results are always two minutes behind. end_time = conn.utctime(-2) for metric in metrics: items = conn.get_measurements(metric, start_time, end_time, for_project=None) for item in items: print_result(item, options.fmt) sleep(60) start_time = conn.utctime(-3) except Exception as err: print('...failed; reason: %s' % err) sys.exit(1) else: print('Invalid command. Try --help') sys.exit(1) if __name__ == "__main__": main() 07070100000034000081ED000003E800000064000000015BE06E0300002752000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/log_tailer.py#!/usr/bin/python # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import ConfigParser from ConfigParser import NoSectionError, NoOptionError import logging from optparse import OptionParser import json import time from swiftlm.utils.log_tailer import LogTailer, parse_proxy_log_message, \ AccessStatsRecorder from swiftlm.utils.utility import dump_swiftlm_uptime_data, get_logger, \ sleep_interval usage = """ Program to tail a swift log file, extract messages coming from the proxy-logging middleware and derive stats such as number of operations, bytes put and bytes get. The stats are worked out as a total and for each project. At the end of each cycle, it writes the stats as metrics to a metrics json file. The swiftlm plugin will send these to Monasca. Usage: swiftlm-log-tail --config Options: --config Defaults to /etc/swiftlm/swiftlm-scan.conf Example configuration file: [log-tailer] tailed_log_file=/var/log/swift/swift.log metric_file=/var/cache/swiftlm/access_log_metrics.json interval=60 monasca_agent_interval=30 reseller_prefixes=AUTH_,SERVICE_ [logging] log_level = info log_facility = LOG_LOCAL0 log_format = '%(name)s: %(message)s' where: tailed_log_file Name of file to tail. Defaults to /var/log/swift/swift.log metric_file Name of file to write metrics. Defaults to /var/cache/swiftlm/access_log_metrics.json interval Cycle time (time between reads of log file). Defaults to 60 seconds monasca_agent_interval How often Monasca Agent runs. Defaults to 30 seconds (it may run more often than this without harm). reseller_prefixes Project prefix to make a Swift account name. Defaults to AUTH_ """ def make_measurements(metric_name_prefix, stats, timestamp, dimensions={}): """ Convert stats structure into metric format :param metric_name_prefix: first part of metric name :param stats: the stats :param timestamp: timestamp :param dimensions: dimensions (optional) :return: """ metrics = [] for stat_item, metric_name in [('ops', 'ops'), ('bytes_put', 'put.bytes'), ('bytes_get', 'get.bytes')]: dimensions.update({'service': 'object-storage'}) metrics.append({'metric': metric_name_prefix + metric_name, 'value': stats.get(stat_item), 'timestamp': timestamp, 'dimensions': dimensions}) return metrics def purge_old_measurements(metrics, interval, monasca_agent_interval): """ Purge old measurements Monasca agent may run less than our interval. If we only kept the latest measurements in the metric json file, the agent might miss them. So instead of overwriting, we append metrics. This function purges metrics when we're convinced that the agent has read them. Since we're keeping measurements from several cycles in the metrix json file, the reader must discard duplicates (the Swiftlm Monasca plugin discards duplicates). :param metrics: List of metrics :param interval: How often we generate new measurements :param monasca_agent_interval: How long to retain old measurements :return: metrics is updated in place """ retain_time = interval + monasca_agent_interval for metric in list(metrics): # Note: iterate over a copy if (metric.get('timestamp') + retain_time) < time.time(): metrics.remove(metric) def run_forever(log_file_name, interval, metric_file, reseller_prefixes, logger, monasca_agent_interval): """ The main cycle loop :param log_file_name: name of file we are tailing :param interval: how often we report metrics :param metric_file: file to dump metrics into :param reseller_prefixes: list of account prefixes to process :param logger: a logger """ logger.info('Starting. Reading from: %s' % log_file_name) # Get into sync with wake up interval WAKE_UP_TIME = 60 time.sleep(sleep_interval(interval, time.time(), WAKE_UP_TIME)) while True: try: log_tail = LogTailer(log_file_name) break except IOError as err: if err.errno == 2: # Log file does not yet exist time.sleep(sleep_interval(interval, time.time(), WAKE_UP_TIME)) else: raise err cycle = 10 metric_data = [] while True: try: # Sleep until next wake up time.sleep(sleep_interval(interval, time.time(), WAKE_UP_TIME)) # Timestamp means the metrics are measurements of the data # gathered in the *last* time interval (i.e., timestamp is the # end of the cycle) timestamp = time.time() # Read lines written to the log since we last read the file # and process lines to extract stats. stats = AccessStatsRecorder() for line in log_tail.lines(): result = parse_proxy_log_message( line, reseller_prefixes=reseller_prefixes) if isinstance(result, dict): stats.record_op(result.get('verb'), result.get('http_status'), result.get('bytes_transferred'), project=result.get('project'), container=result.get('container'), obj=result.get('obj')) # Convert stats into metric measurements total_metrics = make_measurements('swiftlm.access.host.operation.', stats.get_stats(), timestamp) for measurement in total_metrics: metric_data.append(measurement) # Occasionally, log the totals cycle += 1 if cycle >= 10: for metric in total_metrics: logger.info('Metric: %s' % json.dumps(metric)) cycle = 0 for project in stats.get_projects(): project_metrics = make_measurements( 'swiftlm.access.host.operation.project.', project.get_stats(), timestamp, dimensions={'tenant_id': project.get_stats().get('name')}) for measurement in project_metrics: metric_data.append(measurement) # Record that we processed data without error metric_data.append({'metric': 'swiftlm.access.host.operation.status', 'value': 0, 'timestamp': timestamp, 'dimensions': {'service': 'object-storage'}, 'value_meta': {'msg': 'OK'}}) except Exception as err: # noqa metric_data = [] metric_data.append({'metric': 'swiftlm.access.host.operation.status', 'value': 2, 'timestamp': time.time(), 'dimensions': {'service': 'object-storage'}, 'value_meta': {'msg': err}}) purge_old_measurements(metric_data, interval, monasca_agent_interval) dump_swiftlm_uptime_data(metric_data, metric_file, logger, lock_timeout=2) def main(): parser = OptionParser(usage=usage) parser.add_option('--config', dest='config_file', default='/etc/swiftlm/swiftlm-scan.conf') (options, args) = parser.parse_args() config = ConfigParser.RawConfigParser() prefix_list = 'AUTH_' interval = 60 monasca_agent_interval = 30 metric_file = '/var/cache/swiftlm/access_log_metrics.json' log_file_name = '/var/log/swift/swift.log' if config.read(options.config_file): try: interval = int(config.get('log-tailer', 'interval')) except (NoSectionError, NoOptionError): pass try: metric_file = config.get('log-tailer', 'metric_file') except (NoSectionError, NoOptionError): pass try: log_file_name = config.get('log-tailer', 'tailed_log_file') except (NoSectionError, NoOptionError): pass try: monasca_agent_interval = int(config.get('log-tailer', 'monasca_agent_interval')) except (NoSectionError, NoOptionError): pass try: prefix_list = config.get('log-tailer', 'reseller_prefixes') except (NoSectionError, NoOptionError): pass try: logger = get_logger(dict(config.items('logging')), name='log-tailer') except (NoSectionError, NoOptionError): logging.basicConfig(level=logging.DEBUG) logger = logging else: logging.basicConfig(level=logging.DEBUG) logger = logging reseller_prefixes = prefix_list.strip().split(',') run_forever(log_file_name, interval, metric_file, reseller_prefixes, logger, monasca_agent_interval) if __name__ == '__main__': main() 07070100000035000081A4000003E800000064000000015BE06E030000283F000000000000000000000000000000000000004300000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/memcached.py # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import ConfigParser from ConfigParser import NoSectionError, NoOptionError from optparse import OptionParser import sys import time from swift.common.memcached import (MemcacheRing, CONN_TIMEOUT, POOL_TIMEOUT, IO_TIMEOUT, TRY_COUNT) from swiftlm.utils.utility import KeyNames usage = """ Program to test the memcached ring. It uses the Swift client so the results may not be applicable to other memcached clients. test_conns repeatedly connects and does a get operation -- and reports on latency to connect/get. test_gets connects once, and repeatedly does get operations. The latency reported is the time to do the get operation. Usage: swiftlm-memcached set blah "hello world" swiftlm-memcached get blah swiftlm-memcached test_conns [--keys ] [--run_time ] swiftlm-memcached test_gets [--keys ] [--run_time ] Options: --config Defaults to /etc/swift/memcache.conf. You typically need to run with sudo to access /etc/swift/memcache.conf. --servers :,,... Use these servers instead of from --config. You can use --servers if you do not have a memcache.conf file. --keys In test_conns and test_gets, set this number of unique keys. Default is 100 keys. --run_time In test_conns and test_gets, run for this number of seconds. Defaults to 30 seconds. """ class Buckets(object): """ Track latency average and pattern """ def __init__(self): self.slots = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 2.0, 5.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0] self.bucket = {} for slot in self.slots: self.bucket[str(slot)] = 0 self.bucket['>'] = 0 self.total = 0.0 self.count = 0 self.min = None self.max = 0.0 def record(self, latency): if self.min is None: self.min = latency self.max = latency if self.min > latency: self.min = latency if self.max < latency: self.max = latency off_end = True for slot in self.slots: if latency <= slot: self.bucket[str(slot)] += 1 off_end = False break if off_end: self.bucket['>'] += 1 self.total += latency self.count += 1 def __repr__(self): lines = [] if self.count > 0: lines.append('Average: %s ms' % float(self.total / self.count)) headings = '' for slot in self.slots: headings += ' %s ' % slot headings += ' > ' lines.append(headings) values = '' for slot in self.slots: # Does not align once slot is double-digit; not worth effort to fix values += ' %5d ' % self.bucket[str(slot)] values += ' %5d ' % self.bucket['>'] lines.append(values) lines.append('Range %s ms - %s ms' % (self.min, self.max)) return '\n'.join(lines) def memcached_main(action, options, key=None, value=None): memcache_ring = MemcacheRing( options['servers'], connect_timeout=options['connect_timeout'], pool_timeout=options['pool_timeout'], tries=options['tries'], io_timeout=options['io_timeout'], allow_pickle=(options['serialization_format'] == 0), allow_unpickle=(options['serialization_format'] <= 1), max_conns=options['max_conns']) if action == 'set': start_time = time.time() memcache_ring = get_memcache_ring(options) memcache_ring.set(key, value, time=600) print('Duration: %s ms' % str((time.time() - start_time) * 1000)) elif action == 'get': start_time = time.time() memcache_ring = get_memcache_ring(options) value, latency = memcache_get(memcache_ring, key) print('Duration: %s ms' % str((time.time() - start_time) * 1000)) print('Value: %s' % value) elif action == 'test_conns': test_conns(options) elif action == 'test_gets': test_gets(memcache_ring, options) def get_memcache_ring(options): memcache_ring = MemcacheRing( options['servers'], connect_timeout=options['connect_timeout'], pool_timeout=options['pool_timeout'], tries=options['tries'], io_timeout=options['io_timeout'], allow_pickle=(options['serialization_format'] == 0), allow_unpickle=(options['serialization_format'] <= 1), max_conns=options['max_conns']) return memcache_ring def test_conns(options): # Set a value for every key memcache_ring = get_memcache_ring(options) keys = KeyNames(options['number_of_keys']) for key in keys.get_keys(): memcache_ring.set(key, key, time=options['run_time'] + 100) # Read them back until time to stop start_time = time.time() count = 0 buckets = Buckets() for key in keys.get_keys_forever(): count += 1 # This does not connect del memcache_ring memcache_ring = get_memcache_ring(options) value, latency = memcache_get(memcache_ring, key) buckets.record(latency) if time.time() - start_time > options['run_time']: break print('Duration: %s sec' % str(time.time() - start_time)) print('Average for %s cycles: %s ms' % ( count, str((time.time() - start_time) * 1000 / count))) print('Latency for: connect + get') print('%s' % buckets) def test_gets(memcache_ring, options): # Set a value for every key keys = KeyNames(options['number_of_keys']) for key in keys.get_keys(): memcache_ring.set(key, key, time=options['run_time'] + 100) # Read them back until time to stop start_time = time.time() count = 0 buckets = Buckets() for key in keys.get_keys_forever(): count += 1 value, latency = memcache_get(memcache_ring, key) buckets.record(latency) if time.time() - start_time > options['run_time']: break print('Duration: %s sec' % str(time.time() - start_time)) print('Average for %s cycles: %s ms' % ( count, str((time.time() - start_time) * 1000 / count))) print('Latency per get:') print('%s' % buckets) def memcache_get(memcache_ring, key): start_time = time.time() value = memcache_ring.get(key) latency = time.time() - start_time return value, latency * 1000 def main(): parser = OptionParser(usage=usage) parser.add_option('--config', dest='config_file', default=None) parser.add_option('--servers', dest='servers', default=None) parser.add_option('--run_time', dest='run_time', default=30) parser.add_option('--keys', dest='keys', default=100) (options, args) = parser.parse_args() if not options.config_file: options.config_file = '/etc/swift/memcache.conf' main_options = {} memcache_options = {} main_options['serialization_format'] = 2 main_options['max_conns'] = 2 if options.config_file: memcache_conf = ConfigParser.RawConfigParser() if memcache_conf.read(options.config_file): # if memcache.conf exists we'll start with those base options try: memcache_options = dict(memcache_conf.items('memcache')) except NoSectionError: pass try: memcache_servers = \ memcache_conf.get('memcache', 'memcache_servers') except (NoSectionError, NoOptionError): print('Missing memcache_servers in %s' % options.config_file) sys.exit(1) try: main_options['serialization_format'] = int( memcache_conf.get('memcache', 'memcache_serialization_support')) except (NoSectionError, NoOptionError, ValueError): pass try: new_max_conns = \ memcache_conf.get('memcache', 'memcache_max_connections') main_options['max_conns'] = int(new_max_conns) except (NoSectionError, NoOptionError, ValueError): pass elif not options.servers: print('unable to read %s' % options.config_file) sys.exit(1) if options.servers: memcache_servers = options.servers servers = [s.strip() for s in memcache_servers.split(',') if s.strip()] main_options['servers'] = servers main_options['connect_timeout'] = float(memcache_options.get( 'connect_timeout', CONN_TIMEOUT)) main_options['pool_timeout'] = float(memcache_options.get( 'pool_timeout', POOL_TIMEOUT)) main_options['tries'] = int(memcache_options.get('tries', TRY_COUNT)) main_options['io_timeout'] = float(memcache_options.get('io_timeout', IO_TIMEOUT)) main_options['run_time'] = float(options.run_time) main_options['number_of_keys'] = int(options.keys) if len(args) == 0: print('Missing command') sys.exit(1) action = args[0] if len(args) > 1: key = args[1] else: key = None if len(args) > 2: value = args[2] else: value = None memcached_main(action, main_options, key=key, value=value) if __name__ == '__main__': main() 07070100000036000081A4000003E800000064000000015BE06E0300001F94000000000000000000000000000000000000004C00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/probe_100_continue.py # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from httplib import HTTPSConnection, HTTPConnection, HTTPResponse, HTTPMessage from httplib import CONTINUE, _UNKNOWN from optparse import OptionParser import os import socket import sys from urlparse import urlparse usage = """ Program to test a Swift system to see if it responds with a 100 Continue response when the Expect:100-continue request header is specified. The reason for testing is that if the system does not respond, the client has to timeout before sending the body/content. As a result such clients suffer lower performance when uploading objects. Not all clients use Expect:100-continue. Libcurl is known to use the feature. Swift itself responds to Expect:100-continue. However, front-ends have been known not to respond. So this is really a test of the SSL termination or load balancer that front-ends Swift. Usage: probe_100_continue.py --os-auth-token= --os-storage-url=https://:8080/v1/ """ class ExpectHTTPResponse(HTTPResponse): def __init__(self, sock, debuglevel=0, strict=0, method=None): self.sock = sock self.fp = sock.makefile('rb') self.debuglevel = debuglevel self.strict = strict self._method = method self.msg = None # from the Status-Line of the response self.version = _UNKNOWN # HTTP-Version self.status = _UNKNOWN # Status-Code self.reason = _UNKNOWN # Reason-Phrase self.chunked = _UNKNOWN # is "chunked" being used? self.chunk_left = _UNKNOWN # bytes left to read in current chunk self.length = _UNKNOWN # number of bytes left in response self.will_close = _UNKNOWN # conn will close at end of response def expect_response(self): if self.fp: self.fp.close() self.fp = None self.fp = self.sock.makefile('rb', 0) version, status, reason = self._read_status() if status != CONTINUE: self._read_status = lambda: (version, status, reason) self.begin() else: self.status = status self.reason = reason.strip() self.version = 11 self.msg = HTTPMessage(self.fp, 0) self.msg.fp = None class ExpectHTTPSConnection(HTTPSConnection): response_class = ExpectHTTPResponse def putrequest(self, method, url, skip_host=0, skip_accept_encoding=0): self._method = method self._path = url return HTTPSConnection.putrequest(self, method, url, skip_host, skip_accept_encoding) def getexpect(self): response = ExpectHTTPResponse(self.sock, strict=self.strict, method=self._method) response.expect_response() return response class ExpectHTTPConnection(HTTPConnection): response_class = ExpectHTTPResponse def putrequest(self, method, url, skip_host=0, skip_accept_encoding=0): self._method = method self._path = url return HTTPConnection.putrequest(self, method, url, skip_host, skip_accept_encoding) def getexpect(self): response = ExpectHTTPResponse(self.sock, strict=self.strict, method=self._method) response.expect_response() return response def http_connect(scheme, ipaddr, port, method, path, headers=None, timeout=10): if scheme == 'https': conn = ExpectHTTPSConnection('%s:%s' % (ipaddr, port), timeout=timeout) else: conn = ExpectHTTPConnection('%s:%s' % (ipaddr, port), timeout=timeout) conn.path = path conn.putrequest(method, path, skip_host=(headers and 'Host' in headers)) if headers: for header, value in headers.iteritems(): conn.putheader(header, str(value)) conn.endheaders() return conn def probe_100_continue(url, token): """ Test Expect: 100-continue """ # Get endpoint IP address, port, etc. urlparts = urlparse(url) ipaddr = socket.gethostbyname(urlparts.hostname) port = urlparts.port scheme = urlparts.scheme if scheme == 'https' and not port: port = 443 if scheme == 'http' and not port: port = 80 # We first need to create a container try: headers = {} headers['X-Auth-Token'] = token path = urlparts.path + '/100-continue-test' print('Creating container: PUT %s' % path) conn = http_connect(scheme, ipaddr, port, 'PUT', path, headers=headers, timeout=6) except Exception as err: print('Failed to connect to: %s:%s\nReason: %s' % (ipaddr, port, err)) print('\nTest result: UNKNOWN') sys.exit(1) try: response = conn.getexpect() except Exception as err: print('Failed to get response: %s' % err) print('\nTest result: UNKNOWN') sys.exit(1) if response.status not in (200, 201, 202, 204): print('Received: HTTP %s %s' % (response.status, response.reason)) print('%s' % response.msg) print(response.read()) print('\nTest result: UNKNOWN') sys.exit(1) # Create object using expect:100-continue # The request must be a PUT on an object and must have a non-zero # Content-Length. We set Content-Length, but don't bother to actually # send data -- we only need to see the 100 response. try: headers = {} headers['X-Auth-Token'] = token headers['Expect'] = '100-continue' headers['Content-Length'] = 1200 path = urlparts.path + '/100-continue-test/100-continue-test-obj' print('Connecting to ipaddr: %s port: %s' % (ipaddr, port)) conn = http_connect(scheme, ipaddr, port, 'PUT', path, headers=headers, timeout=3) except Exception as err: print('Failed to connect: %s' % err) print('\nTest result: UNKNOWN') sys.exit(1) # Wait for response and check it. try: print('Waiting for response...') response = conn.getexpect() except Exception as err: print('Failed to get response: %s' % err) print('\nTest result: UNKNOWN') sys.exit(1) if response.status == 100: print('Received: HTTP 100 Continue') print('\nTest result: PASSED') sys.exit(0) else: print('Received: HTTP %s %s' % (response.status, response.reason)) try: print('%s' % response.msg) print(response.read()) except Exception as err: print('Unable to read response body') print('\nTest result: FAILED') sys.exit(1) def main(): parser = OptionParser(usage=usage) parser.add_option('--os-storage-url', dest='os_storage_url', default=None) parser.add_option('--os-auth-token', dest='os_auth_token', default=None) (options, args) = parser.parse_args() url = None token = None url = options.os_storage_url token = options.os_auth_token if not (url and token): print('Please specify --os-storage-url and --os-auth-token\n' 'Use --help to show usage.') sys.exit(1) probe_100_continue(url, token) if __name__ == '__main__': main() 07070100000037000081A4000003E800000064000000015BE06E030000176B000000000000000000000000000000000000004000000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/runner.py# encoding: utf-8 # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from __future__ import print_function import argparse import pkg_resources import json import sys import traceback import yaml from swiftlm.utils.metricdata import MetricData from swiftlm.utils.values import Severity from swiftlm.utils.utility import SwiftlmCheckFailure, lock_file def display_json(metrics, pretty): dumped_metrics = '' if pretty: kwargs = { 'sort_keys': True, 'indent': 2, } dumped_metrics = json.dumps(metrics, **kwargs) else: items = [] for item in metrics: items.append(json.dumps(item)) dumped_metrics = '[\n' dumped_metrics += ',\n'.join(items) dumped_metrics += '\n]\n' return dumped_metrics def display_yaml(metrics, pretty): yaml.add_representer(Severity, Severity.yaml_repr, yaml.SafeDumper) kwargs = {} if pretty: kwargs = {'default_flow_style': False} return(yaml.safe_dump(metrics, **kwargs)) FORMATS = { 'json': display_json, 'yaml': display_yaml, } def construct_parser(plugins): parser = argparse.ArgumentParser(description='XXX') # Create a flag for each plugin that adds the matching function to the # selected list if it appears on the command line. selection_group = parser.add_argument_group( 'Available Checks', 'Select one or more of the available checks to run as a subset.' ) for name, unloaded_func in plugins.items(): func = unloaded_func.load() help_string = func.__doc__ or 'Reserved for future use.' selection_group.add_argument( '--' + name, dest='selected', action='append_const', const=func, help=help_string ) parser.add_argument( '--format', choices=FORMATS.keys(), default='json', help='Format output (default: %(default)s).' ) parser.add_argument( '-p', '--pretty', action='store_true', help='Format output in a more easy to read way.' ) parser.add_argument( '-v', '--verbose', action='count' ) parser.add_argument( '--filename', metavar='', default=False, help='File to store scan results into' ) return parser def parse_args(): ps = pkg_resources.get_entry_map('swiftlm', 'swiftlm.plugins') p = construct_parser(ps) args = p.parse_args() # we make the common case easy, No selected flags indicate that we should # run all diagnostics. if args.selected is None: args.selected = [f.load() for f in ps.values()] return args def main(): args = parse_args() metrics = [] for func in args.selected: try: r = func() if isinstance(r, list) and r and isinstance(r[0], MetricData): metrics.extend([result.metric() for result in r]) elif isinstance(r, MetricData): metrics.append(r.metric()) except SwiftlmCheckFailure as err: r = MetricData.single('check.failure', Severity.fail, '{error} | Failed with: {check}', dimensions={'component': 'swiftlm-scan', 'service': 'object-storage'}, msgkeys={'check': func.__module__, 'error': str(err)}) metrics.append(r.metric()) except: # noqa t, v, tb = sys.exc_info() backtrace = ' '.join(traceback.format_exception(t, v, tb)) r = MetricData.single('check.failure', Severity.fail, '{error} | Failed with: {check}', dimensions={'component': 'swiftlm-scan', 'service': 'object-storage'}, msgkeys={'check': func.__module__, 'error': backtrace.replace('\n', ' ')}) metrics.append(r.metric()) # There is no point in reporting multiple measurements of # swiftlm.check.failure metric in same cycle. check_failures_found = [] for metric in metrics: if metric.get('metric') == 'swiftlm.check.failure': check_failures_found.append(metric) if check_failures_found: # Remove all except one instance for metric in check_failures_found[:-1]: metrics.remove(metric) else: r = MetricData.single('check.failure', Severity.ok, 'ok', dimensions={'component': 'swiftlm-scan', 'service': 'object-storage'}) metrics.append(r.metric()) dumped_metrics = FORMATS[args.format](metrics, args.pretty) out_stream = sys.stdout if args.filename: try: with lock_file(args.filename, 2, unlink=False) as cf: cf.truncate() cf.write(dumped_metrics) except (Exception, Timeout) as err: print('ERROR: %s' % err) sys.exit(1) else: out_stream = sys.stdout out_stream.write(dumped_metrics) if __name__ == '__main__': main() 07070100000038000081ED000003E800000064000000015BE06E03000010C7000000000000000000000000000000000000003F00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/scout.py# Copyright (c) 2014 OpenStack Foundation. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """ Report status in program friendly format """ from __future__ import print_function import json import optparse import sys import yaml from swiftlm.utils.scout import SwiftlmScout def scout_main(): """ Gather data and print it. """ usage = ''' usage: {prog} [ --conf=] [--all] [--aggregate] [--path [--ring_type=]] [--timeout=] [--verbose] [--outformat= yaml | json] Examples: {prog} --all --outformat=yaml {prog} --path diskusage --verbose '''.format(prog='swiftlm-scout') args = optparse.OptionParser(usage) args.add_option('--all', action='store_true', default=False, help='Gather all known data') args.add_option('--aggregate', action='store_true', default=False, help='Only gather data for aggregation') args.add_option('--path', default=None, help='Gather a specific item. Typically used in' ' testing') args.add_option('--ring_type', default=None, help='Ring to use with --path') args.add_option('--outformat', type='string', metavar='FORMAT', help='Format of output.' ' Supported values are:' ' "yaml" (default), "json"', default='yaml') args.add_option('--timeout', type='int', metavar='SECONDS', help='Time to wait for a response from a server', default=5) args.add_option('--conf', default='/etc/swiftlm/scout.conf', help='Reserved for future use') args.add_option('--verbose', action='store_true', help='Print verbose info. Useful to troubleshoot.') args.add_option('--show_errors', action='store_true', default=False, help='Do not suppress errors') options, arguments = args.parse_args() actions = ['all'] if options.all: actions = ['all'] if options.aggregate: actions = ['aggregate'] if options.path: actions = ['path'] path_ring_type = 'all' recon_path = options.path if options.ring_type: path_ring_type = options.ring_type suppress_errors = True if options.show_errors: suppress_errors = False if options.verbose: suppress_errors = False recon_data = SwiftlmScout({}, suppress_errors=suppress_errors, verbose=options.verbose, timeout=options.timeout) if 'all' in actions: recon_data.scout_all() if 'aggregate' in actions: recon_data.scout_aggregate() if 'path' in actions: recon_data.path(recon_path, path_ring_type) collected = recon_data.get_results() if options.outformat == 'json': print(json.dumps(collected)) elif options.outformat == 'yaml': print(yaml.safe_dump(collected, allow_unicode=True, default_flow_style=False)) else: print('Invalid value for --outformat') sys.exit(1) def main(): try: scout_main() except KeyboardInterrupt: print('\n') if __name__ == '__main__': main() 07070100000039000081A4000003E800000064000000015BE06E0300009EF2000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/supervisor.py# encoding: utf-8 # (c) Copyright 2015, 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from optparse import OptionParser import sys import os from yaml import safe_load, scanner from swiftlm.rings.ardana_model import Consumes, ServersModel from swiftlm.rings.ring_builder import RingBuilder, RingDelta from swiftlm.rings.ring_model import RingSpecifications, \ SwiftModelException, DriveConfigurations, DriveConfiguration DEFAULT_ETC = '/etc/swiftlm' DEFAULT_CONFIG_DIR = 'config' DEFAULT_INPUT_VARS = 'input-model.yml' DEFAULT_CP_SERVERS = 'control_plane_servers.yml' DEFAULT_BUILDER_DIR = 'builder_dir' DEFAULT_SWIFT_RING_BUILDER_CONSUMES = 'swift_ring_builder_consumes.yml' DEFAULT_DEPLOY_DIR = 'deploy_dir' DEFAULT_RING_DELTA = 'ring-delta.yml' DEFAULT_OSCONFIG = 'drive_configurations' DEFAULT_CONFIGURATION_DATA = 'configuration_data.yml' usage = ''' % {cmd} --cloud --control-plane [--make-delta [--weight-step ] [--stop-on-warnings]] [--rebalance [--dry-run]] [--report [--detail=summary|full]] [--etc ] [--ring-delta [--format yaml|json] [--help] Examples: Example 1: Create a ring-delta file given an input model and (possibly) existing rings. The results are written to the ./ring-delta.yml file. If a ring does not exist in ./builder_dir, the ring-delta file will contain directives to create the ring. % {cmd} --make-delta --ring-delta ./ring-delta.yml --cloud standard --control-plane ccp Example 2: Update and rebalance (usually existing) rings given a ring-delta file. The command reads the ./ring-delta.yml file and updates the rings. Add --dry-run to see what commands would be issued to swift-ring-builder: % {cmd} --cloud cld --control_plane ccp --rebalance --ring-delta ./ring-delta.yml --cloud standard --control-plane ccp Example 3: Rebalance existing rings. This is similar to the above command except the ring-delta defaults. % {cmd} --rebalance --builder_dir ./builder_dir --cloud standard --control-plane ccp Example 4: Build, update and rebalance rings given an input model and (possibly) existing rings. This command compresses examples 1 and 2 above into a single command. % {cmd} --make-delta --rebalance --cloud standard --control-plane ccp Example 5: Rebalance rings after adding or removing a number of servers The --weight-step option prevents weights being moved too quickly on the new servers. --weight-step is only needed if it is not present in the ring-specifications. % {cmd} --make-delta --rebalance --weight-step 4.0 --cloud standard --control-plane ccp Example 6: Show the actions that would be performed if a --rebalance is performed using a ring delta file as specified by the --ring-delta option (built using a prior --make-delta). These variants give different levels of detail: % {cmd} --report --detail summary --ring-delta /tmp/ring-delta.yaml --cloud standard --control-plane ccp % {cmd} --report --detail full --ring-delta /tmp/ring-delta.yaml --cloud standard --control-plane ccp % {cmd} --make-delta --rebalance --dry-run --ring-delta /tmp/ring-delta.yaml --cloud standard --control-plane ccp '''.format(cmd=os.path.basename(__file__)) class CloudMultiSite(object): """ Manage multi-site aspects This class ties the models from several control planes into a coordinated view of the world. The purpose is to support Swift regions, where we've deployed two or more clouds independently. By copying data files from the "secondary" control planes, we can build rings on the "primary" control plane. We then copy the builder files/rings back to the secondary control planes. Most of the time, there is only one primary control plane (i.e., the system we are running on now, so there will only be one set of input data). The cloud model data is structured as follows: /etc/swiftlm///config/ - contains the model /etc/swiftlm///builder_dir/ - where we store builder files and rings for this system (as specified by the --cloud and --control-plane options) /etc/swiftlm//ring-delta.yml - where we store the ring delta The files in the config directory are as follows: input-model.yml -- contains legacy list of servers control_plane_servers.yml -- list of servers in this control plane swift_ring_builder_consumes.yml -- the network relationships configuration_data.yml -- the Swift configuration-data object drive_configurations/ -- directory containing files named /drive_configurations.yml """ def __init__(self, options): ''' Walk the etc structure to discover the control planes :param options: The options supplied to the swiftlm-ring-supervisor: --cloud: this cloud name --control_plane: this control plane --etc: usually /etc/swiftlm ''' self.my_cloud = options.cloud self.my_control_plane = options.control_plane self._paths = {} for cloud in [f for f in os.listdir(options.etc) if os.path.isdir(os.path.join(options.etc, f))]: if cloud == 'legacy_builder_dir': # We will leave legacy builder files in /etc/swiftlm # we assume no cloud will be called "legacy_builder_dir"! continue cloud_path = os.path.join(options.etc, cloud) for control_plane in [f for f in os.listdir(cloud_path) if os.path.isdir(os.path.join(cloud_path, f))]: control_plane_path = os.path.join(cloud_path, control_plane) config_dir_path = os.path.join(control_plane_path, DEFAULT_CONFIG_DIR) self._paths[(cloud, control_plane)] = { # Inputs 'input-model': os.path.join(config_dir_path, DEFAULT_INPUT_VARS), 'control_plane_servers': os.path.join( config_dir_path, DEFAULT_CP_SERVERS), 'swift_ring_builder_consumes': os.path.join( config_dir_path, DEFAULT_SWIFT_RING_BUILDER_CONSUMES), 'osconfig': os.path.join(config_dir_path, DEFAULT_OSCONFIG), 'configuration_data': os.path.join( config_dir_path, DEFAULT_CONFIGURATION_DATA), # Outputs 'ring-delta': (options.ring_delta or os.path.join(control_plane_path, DEFAULT_RING_DELTA)), 'builder_dir': os.path.join(control_plane_path, DEFAULT_BUILDER_DIR) } # Validate that we found a directory for the control plane # we're running on. found = False for cloud, control_plane in self._paths.keys(): if (cloud == self.my_cloud and control_plane == self.my_control_plane): found = True if not found and not options.unittest: sys.exit('Cannot find configuration files in %s' % os.path.join(options.etc, self.my_cloud, self.my_control_plane)) def path(self, cloud, control_plane): return self._paths[(cloud, control_plane)] def control_planes(self): return self._paths.keys() def main(): parser = OptionParser(usage=usage) parser.add_option('--etc', dest='etc', default=DEFAULT_ETC, help='Overrides /etc/swiftlm (for testing)') parser.add_option('--cloud', dest='cloud', default=None, help='The name of the cloud') parser.add_option('--control-plane', dest='control_plane', default=None, help='The name of the control plane') parser.add_option('--ring-delta', dest='ring_delta', default=None, help='Name of ring-delta file (as output or input' ' A value of "-" (on output means to write' ' to stdout') parser.add_option('--format', dest='fmt', default='yaml', help='One of yaml or json.' ' When used with --ring-delta, specifies the' ' format of the file.') parser.add_option('--detail', dest='detail', default='summary', help='Level of detail to use with --report.' ' Use summary or full') parser.add_option('--report', dest='report', default=False, action="store_true", help='Explain what the ring delta represents.' ' Optionally use --detail.') parser.add_option('--dry-run', dest='dry_run', default=False, action="store_true", help='Show the proposed swift-ring-builder commands') parser.add_option('--pretend-min-part-hours-passed', dest='pretend_min_part_hours_passed', default=False, action="store_true", help='Executes the pretend_min_part_hours_passed command' ' on each ring before running rebalance.' ' Use with caution.') parser.add_option('--make-delta', dest='make_delta', default=False, action="store_true", help='Make a ring delta file') parser.add_option('--rebalance', dest='rebalance', default=False, action="store_true", help='Build (or rebalance) rings') parser.add_option('--limit-ring', dest='limit_ring', default=None, help='Limits actions to given ring') parser.add_option('--size-to-weight', dest='size_to_weight', default=float(1024 * 1024 * 1024), help='Conversion factor for size to weight. Default is' ' 1GB is weight of 1 (a 4Tb drive would be assigned' ' a weight of 4096') parser.add_option('--weight-step', dest='weight_step', default=None, help='When set, weights are changed by at most this' ' value. Overrides value in ring specification.') parser.add_option('--allow-partitions', dest='allow_partitions', default=False, action='store_true', help='Allow devices to be assigned to partitions.' ' Default is to use a full disk drive.') parser.add_option('--stop-on-warnings', dest='stop_on_warnings', default=False, action='store_true', help='Used with --make-delta. Exit with error if there' ' are model missmatch warnings.' ' Default is to only exit with error for errors.') parser.add_option('--unittest', dest='unittest', default=False, action='store_true', help='Set by unittests. Never set on command line.') (options, args) = parser.parse_args() if not (options.cloud and options.control_plane): sys.exit('Must specify both --cloud and --control_plane') sites = CloudMultiSite(options) my_cloud = sites.my_cloud my_control_plane = sites.my_control_plane my_config = sites.path(my_cloud, my_control_plane) # # Work out what we need to do. Validate arguments needed by an action # are present. # actions = [] if options.make_delta: actions.append('init-delta') actions.append('input-from-model') actions.append('read-builder-dir') actions.append('open-osconfig-dir') actions.append('make-delta') actions.append('write-to-delta') if options.fmt not in ['yaml', 'json']: print('Invalid value for --format') if options.report: actions.append('init-delta') actions.append('read-from-delta') actions.append('report') if options.detail not in ['summary', 'full']: sys.exit('Invalid value for --detail') if options.rebalance: actions.append('init-delta') actions.append('open-builder-dir') actions.append('read-from-delta') actions.append('rebalance') if options.fmt not in ['yaml', 'json']: print('Invalid value for --format') if len(actions) == 0: sys.exit('Missing an option to perform some action') if options.report and (options.make_delta or options.rebalance): sys.exit('Do not mix --report with other actions') # # Perform actions # if 'init-delta' in actions: delta = RingDelta() if 'input-from-model' in actions: servers_model = ServersModel('unused', 'unused') consumes = Consumes() ring_model = RingSpecifications(my_cloud, my_control_plane) for cloud, control_plane in sites.control_planes(): config = sites.path(cloud, control_plane) input_model_fd = None try: input_model_fd = open(config.get('input-model'), 'r') except IOError as err: pass # File may not exist since its a legacy item try: cp_server_fd = None cp_server_fd = open(config.get('control_plane_servers'), 'r') except IOError as err: sys.exit('Error on control_plane_server.yml: %s' % err) try: control_plane_servers = None if cp_server_fd: control_plane_servers = safe_load(cp_server_fd) except scanner.ScannerError as err: sys.exit('ERROR reading/parsing: %s' % err) try: consumes_fd = open(config.get('swift_ring_builder_consumes'), 'r') except IOError as err: sys.exit('ERROR: %s' % err) try: input_vars = {'global': {}} if input_model_fd: input_vars = safe_load(input_model_fd) consumes_model = safe_load(consumes_fd) except scanner.ScannerError as err: sys.exit('ERROR reading/parsing: %s' % err) try: if control_plane_servers: servers = control_plane_servers.get( 'control_plane_servers') elif input_vars.get('global').get('all_servers'): servers = input_vars.get('global').get('all_servers') else: sys.exit('No servers found in control plane') servers_model.add_servers(cloud, control_plane, servers) consumes.load_model(consumes_model) except SwiftModelException as err: sys.exit(err) servers_model.register_consumes(consumes) try: config_data_fd = open(my_config.get('configuration_data'), 'r') config_data = safe_load(config_data_fd) except (IOError, scanner.ScannerError) as err: sys.exit('Rings should be in configuration-data.' ' Using old configuration processor?') try: rings_loaded = False if input_vars.get('global').get('all_ring_specifications'): # Model contains Ardana old-style rings ring_model = RingSpecifications(my_cloud, my_control_plane, model=input_vars) rings_loaded = True if config_data and config_data.get('control_plane_rings', config_data.get( 'control-plane-rings')): # Model contains new-style rings --- use instead ring_model.load_configuration(my_cloud, my_control_plane, config_data) rings_loaded = True if not rings_loaded: sys.exit('No ring specifications in input model') except SwiftModelException as err: sys.exit(err) if 'open-builder-dir' or 'read-builder-dir' in actions: try: read_rings = False if 'read-builder-dir' in actions: read_rings = True rings = RingBuilder(my_config.get('builder_dir'), read_rings=read_rings) except IOError as err: sys.exit('ERROR: %s' % err) if 'open-osconfig-dir' in actions: drive_configurations = osconfig_load(sites) if 'make-delta' in actions: try: generate_delta(sites, servers_model, ring_model, rings, drive_configurations, options, delta) except SwiftModelException as err: sys.exit('ERROR: %s' % err) if 'write-to-delta' in actions: if my_config.get('ring-delta') == '-': write_to_file_fd = sys.stdout else: write_to_file_fd = open(my_config.get('ring-delta'), 'w') delta.write_to_file(write_to_file_fd, options.fmt) if 'read-from-delta' in actions: if my_config.get('ring-delta') == '-': sys.exit('--ring-delta- is invalid (read from stdin' 'not supported)') try: delta = RingDelta() read_from_delta_fd = open(my_config.get('ring-delta'), 'r') delta.read_from_file(read_from_delta_fd, options.fmt) except IOError as err: sys.exit('ERROR: %s' % err) if 'report' in actions: print(delta.get_report(options)) if 'rebalance' in actions: rebalance(delta, rings, options) def osconfig_load(sites): """ Load disk drive data :param sites: provides access to the config directory structure :return: the device size data """ drive_configurations = DriveConfigurations() for cloud, control_plane in sites.control_planes(): osconfig_dir = sites.path(cloud, control_plane).get('osconfig') for host in os.listdir(osconfig_dir): filename = os.path.join(osconfig_dir, host, 'drive_configuration.yml') if os.path.exists(filename): try: with open(filename) as fd: model_in_file = safe_load(fd) drive_configuration = DriveConfiguration() items = model_in_file.get('ardana_drive_configuration', []) for item in items: drive_configuration.load_model(item) drive_configurations.add(drive_configuration) except (IOError, scanner.ScannerError) as err: sys.exit('ERROR reading/parsing: %s' % err) return drive_configurations def generate_delta(sites, servers_model, ring_model, rings, drive_configurations, options, delta): """ Generate delta between input model and existing rings This function compares the input model, the existing rings and device size data and works out what changes need to be made to the rings. The output is a ring delta data structure that for each ring and device contains instructions to either add, update or remove the ring or device. Major problems with the input data model will raise SwiftModelException. Minor problems are printed to stdout. :param sites: Access to (multi) site setup :param servers_model: The input vars describing the servers :param ring_model: Ring specifications (from input model) :param rings: The existing rings (or placeholder if not already created) :param drive_configurations: device size data from probing hardware :param options: options to control ring building :param delta: The delta to generate :return: Nothing -- the output is in the delta argument """ model_errors = [] model_warnings = [] # Get ring specifications for this system my_cloud = sites.my_cloud my_control_plane = sites.my_control_plane control_plane_rings = ring_model.get_control_plane_rings(my_cloud, my_control_plane) if not control_plane_rings.is_primary_control_plane(): print('This is not the primary control_plane -- ring building is' ' not done on this system.') delta.register_primary(False) return # # Run through the rings in the model # Register them in the ring delta # If builder files do not exist, mark them to be created # for ringspec in control_plane_rings.rings: ring_name = ringspec.name delta.register_ring(ring_name, ringspec) if not rings.builder_rings.get(ring_name): # Ring is in input model, but no builder file exists delta.delta_ring_actions[ring_name] = ['add'] # Override replica count if there are not enough devices count_overriden = override_replica_count(ringspec, servers_model, model_errors, model_warnings) if count_overriden: ringspec.replica_count = count_overriden # # Cross check that we have ring specifications for each system # referenced by configuration files # try: for cl, cp in sites.control_planes(): if not ring_model.get_control_plane_rings(cl, cp): model_errors.append('Model Mismatch:' ' Cannot find rings-specification for' ' cloud %s control-plane %s.' ' This error may cause many subsequent' ' errors.' % (cl, cp)) except SwiftModelException as err: model_errors.append(err) # # Run through the builder files and match them against the model # Check if model and builder file attributes differ # If not in model anymore, mark the ring to be removed # for ring_name in rings.builder_rings.keys(): if delta.delta_rings.get(ring_name): # Ring already exists delta.delta_ring_actions[ring_name] = ['present'] model_ring = control_plane_rings.get_ringspec(ring_name) builder_ring = rings.get_ringspec(ring_name) # Handle replica_count upgrade to Mitaka delta_ring = delta.delta_rings.get(ring_name) count_overriden = override_replica_count(model_ring, servers_model, model_errors, model_warnings) if count_overriden: delta_ring.replica_count = count_overriden model_ring.replica_count = count_overriden # See if replica count or min_part_hours in model has changed if model_ring.replica_count != builder_ring.replica_count: delta.delta_ring_actions[ring_name].append('set-replica-count') if model_ring.min_part_hours != builder_ring.min_part_hours: delta.delta_ring_actions[ring_name].append( 'set-min-part-hours') # Copy the min_part_hours remaining time to the delta file model_ring['remaining'] = builder_ring.remaining else: # Found builder file, but ring not in model anymore delta.register_ring(ring_name, rings.builder_rings.get(ring_name)) delta.delta_ring_actions[ring_name] = ['remove'] # # Run through all devices in the input model # If already in ring, check if the weight should be changed # If not in ring, mark it to be added # try: for device_info in servers_model.iter_devices(): # Update the device info with the region and zone id from the # ring specifications. region_id, zone_id = control_plane_rings.get_region_zone( device_info.ring_name, device_info.server_groups) not_found_in = '' if not region_id: not_found_in = 'swift-regions' if not zone_id: not_found_in = 'swift-zones' if not_found_in: model_errors.append('Model Mismatch:' ' Cannot find server-groups %s in' ' "ring-specifications". Check the "%s"' ' item for ring %s.' ' Server is' ' %s' % (','.join( device_info.server_groups), not_found_in, device_info.ring_name, device_info.server_name) ) continue # -1 means not defined, default to 1 if region_id == -1: region_id = 1 if zone_id == -1: zone_id = 1 device_info.region_id = region_id device_info.zone_id = zone_id # See if the device is in a builder file of an existing ring found = False for in_ring_device_info in rings.flat_device_list: if device_info.is_same_device(in_ring_device_info): found = True break # Attempt to get disk size information from the drive # configuration data hw_size, hw_fulldrive = drive_configurations.get_hw( device_info.server_name, device_info) if found: # # The device is already in the ring # # Start by assuming no action needed device_info.presence = 'present' # What should the target weight be? if servers_model.server_draining(device_info.server_name): # Draining -- should be 0 model_weight = '{:.2f}'.format(float(0.0)) elif hw_size: # Have drive size -- use drive size as target model_weight = '{:.2f}'.format( float(hw_size) / float(options.size_to_weight) or 1.0) else: # Do not have size information for the drive. Lets assume # the existing weight is ok model_weight = '{:.2f}'.format( float(in_ring_device_info.current_weight)) # How much does model differ from current weight? current_weight = '{:.2f}'.format( float(in_ring_device_info.current_weight)) step = (options.weight_step or control_plane_rings.get_ringspec( ring_name).weight_step or max(current_weight, model_weight)) change = float(model_weight) - float(current_weight) if change > 0: # Weight being changed upwards target_weight = min(float(model_weight), float(current_weight) + float(step)) target_weight = '{:.2f}'.format( float(target_weight)) elif change < 0: target_weight = max(float(model_weight), float(current_weight) - float(step)) target_weight = '{:.2f}'.format( float(target_weight)) else: # Unchanged target_weight = current_weight # Do we need to change the ring? if target_weight != current_weight: # Yes -- ask for a weight change action device_info.target_weight = target_weight device_info.presence = 'set-weight' else: # Unchanged device_info.target_weight = current_weight device_info.model_weight = model_weight device_info.current_weight = current_weight # Is planned for removal? if servers_model.server_removing(device_info.server_name): # Ask for device to be removed device_info.presence = 'remove' device_info.current_weight = current_weight device_info.target_weight = '0.00' device_info.model_weight = '0.00' # Check that model change did not attempt to change zone, etc. changed_item = device_info.is_bad_change(in_ring_device_info) if changed_item: model_errors.append('Model Mismatch:' ' Illegal change of %s for %s on' ' %s (%s)' % (changed_item, device_info.device_name, device_info.server_name, device_info.server_ip)) else: # # Device is not in the ring # if not control_plane_rings.get_ringspec(device_info.ring_name): model_errors.append('Model Mismatch:' ' There is no specification for ring' ' %s. See disk model' ' for %s' % (device_info.ring_name, device_info.server_name)) continue device_info.presence = 'add' if not hw_size: model_errors.append('Model Mismatch:' ' Cannot find drive %s on' ' %s (%s)' % (device_info.device_name, device_info.server_name, device_info.server_ip)) elif not hw_fulldrive and not options.allow_partitions: model_errors.append('Model Mismatch:' ' Drive %s on %s (%s) has' ' several partitions' % (device_info.device_name, device_info.server_name, device_info.server_ip)) else: model_weight = '{:.2f}'.format( float(hw_size) / float(options.size_to_weight) or 1.0) weight_step = (options.weight_step or control_plane_rings.get_ringspec( device_info.ring_name).weight_step) if weight_step: if (float(model_weight) > float(weight_step)): target_weight = '{:.2f}'.format(float(weight_step)) else: target_weight = model_weight else: target_weight = model_weight device_info.target_weight = target_weight device_info.model_weight = model_weight device_info.current_weight = '0.00' # However, do not add devices for a server marked for removal # or draining. if (servers_model.server_removing(device_info.server_name) or servers_model.server_draining( device_info.server_name)): # Do not append to delta continue delta.append_device(device_info) # end of looping though devices except SwiftModelException as err: model_errors.append(err) delta.sort() # # Run through all devices in builder files # If not in model anymore, mark device to be removed # for in_ring_device_info in rings.flat_device_list: found = False for device_info in servers_model.iter_devices(): if device_info.is_same_device(in_ring_device_info): found = True break if not found: # Device is in the builder file, but has been removed from the # input model. Ask for device to be removed in_ring_device_info.presence = 'remove' delta.append_device(in_ring_device_info) if model_errors: for model_error in model_errors: print(model_error) raise SwiftModelException('There are errors or mismatches between' ' the input model and the configuration' ' of server(s).\n' ' Cannot proceed. Correct the errors' ' and try again') if model_warnings and options.stop_on_warnings: for model_warning in model_warnings: print(model_warning) raise SwiftModelException('There are minor mismatches between the' ' input model and the configuration of' ' servers. These are warning severity.' ' We recommend you correct the errors.') def override_replica_count(ringspec, input_model, model_errors, model_warnings): """ The replica count cannot be greater than the number of devices This function works out a replica count of a ring if there are not enough devices. For replicated rings, we will warn the user (but build the ring anyway). For EC rings, we generate an error (which will stop the process). If the replica count is ok, we return None. We also return None for EC rings as the replica count is not directly settable. :param ringspec: the ring specification in the input model :param input_model: the input model :param model_errors: we append any errors here :param model_warnings: we append any warning here :return: the appropriate replica count or None if it should not be changed """ replica_count = ringspec.replica_count ring_name = ringspec.name num_devices = input_model.get_num_devices(ring_name) if num_devices == 0: model_errors.append('There are no devices assigned to' ' ring %s' % ring_name) return None if replica_count > num_devices: if ringspec.replication_policy: model_warnings.append('In ring %s there are not enough devices -- ' ' changing the replica' ' count to %s' % (ring_name, num_devices)) return num_devices else: model_errors.append('In ring %s there are not enough devices to' ' support this number of data and parity' ' fragments' % ring_name) # No change needed return None def rebalance(delta, rings, options): """ Run swift-ring-builder commands This function examines the ring delta and issues commands to create/add, modify or remove rings or devices. Finally, it executes a rebalance command. If the swift-ring-builder reports and error, we error-exit in this function (so that the playbook stops). :param delta: The ring delta :param rings: Existing rings (or placeholder if not already created) :param options: Options affecting this function (such as --dry-run) :return: Commands executed (unit tests emulate --dry-run) """ if not delta.primary: if options.dry_run: print('Not primary site. No ring building occurs here') return for ring_name in delta.delta_rings.keys(): if not os.path.isdir(rings.builder_dir): os.mkdir(rings.builder_dir) cmds = [] for ring_name in delta.delta_rings.keys(): if options.limit_ring and (options.limit_ring != ring_name): continue if 'add' in delta.delta_ring_actions.get(ring_name): ringspec = delta.delta_rings.get(ring_name) cmds.append(rings.command_ring_create(ringspec)) if 'set-replica-count' in delta.delta_ring_actions.get(ring_name): ringspec = delta.delta_rings.get(ring_name) cmds.append(rings.command_set_replica_count(ringspec)) if 'set-min-part-hours' in delta.delta_ring_actions.get(ring_name): ringspec = delta.delta_rings.get(ring_name) cmds.append(rings.command_set_min_part_hours(ringspec)) for device_info in delta.delta_devices: ring_name = device_info.ring_name if options.limit_ring and (options.limit_ring != ring_name): continue if device_info.presence == 'add': cmds.append(rings.command_device_add(device_info)) elif device_info.presence == 'remove': cmds.append(rings.command_device_remove(device_info)) elif device_info.presence == 'set-weight': cmds.append(rings.command_device_set_weight(device_info)) for ring_name in delta.delta_rings.keys(): if options.limit_ring and (options.limit_ring != ring_name): continue ringspec = delta.delta_rings.get(ring_name) if 'add' not in delta.delta_ring_actions.get(ring_name): # pretend_min_part_hours can not be used on a newly created ring if options.pretend_min_part_hours_passed: cmds.append(rings.command_pretend_min_part_hours_passed( ringspec)) cmds.append(rings.command_rebalance(ringspec)) if options.dry_run: for cmd in cmds: print('DRY-RUN: %s' % cmd) else: for cmd in cmds: print('Running: %s' % cmd) status, output = rings.run_cmd(cmd) if status > 0: sys.exit('ERROR: %s' % output) elif status < 0: print('NOTE: %s' % output) return cmds if __name__ == '__main__': main() 0707010000003A000081A4000003E800000064000000015BE06E0300008842000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/cli/uptime_mon.py # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import socket from os import getenv from os import path from sys import exit from collections import OrderedDict import sys import urlparse import uuid import random from swiftclient import Connection from swiftclient import ClientException from swiftclient import http_connection from swiftclient import RequestException from requests.exceptions import ConnectionError import time import argparse import ConfigParser from swiftlm.utils.utility import get_logger from swiftlm.utils.utility import dump_swiftlm_uptime_data, timestamp, Enum, \ sleep_interval from httplib import HTTPException SERVICE_NAME = 'object-storage' MIN_LATENCY = 'swiftlm.umon.target.min.latency_sec' MAX_LATENCY = 'swiftlm.umon.target.max.latency_sec' AVG_LATENCY = 'swiftlm.umon.target.avg.latency_sec' SWIFT_STATE = 'swiftlm.umon.target.check.state' AVAIL_MINUTE = 'swiftlm.umon.target.val.avail_minute' AVAIL_DAY = 'swiftlm.umon.target.val.avail_day' COMPONENT_KEYSTONE_GET_TOKEN = 'keystone-get-token' COMPONENT_REST_API = 'rest-api' COMPONENT_HEALTHCHECK_API = 'healthcheck-api' LATENCY_LOG_INTERVAL = 600 # Log latencies after a number of cycles WAKE_UP_SECOND = 30 # Synchronise sleeps so we wake up in middle of minute common_dimensions = dict() # 0 = ok, 1 = warn, 2 = fail, 3 = unknown component_states = Enum(['ok', 'warn', 'fail', 'unknown']) STATE_VALUE_OK = 0 STATE_VALUE_WARN = 1 STATE_VALUE_FAIL = 2 STATE_VALUE_UNKNOWN = 3 # 0 = Service is down or percent of uptime # 100 = Service is up or percentage of uptime SWIFT_DOWN = 0 SWIFT_UP = 100 class UPtimeMonException(Exception): pass def health_check(url, logger): scheme = urlparse.urlparse(url).scheme netloc = urlparse.urlparse(url).netloc url = scheme + '://' + netloc + '/healthcheck' parsed, conn = http_connection(url) logger.debug('GET %s' % url) conn.request('GET', parsed.path, '', {'X-Auth-Token': 'none-needed'}) resp = conn.getresponse() resp.read() if resp.status < 200 or resp.status >= 300: raise ClientException('GET /healthcheck failed', http_scheme=parsed.scheme, http_host=conn.host, http_port=conn.port, http_path=parsed.path, http_status=resp.status, http_reason=resp.reason) resp_headers = {} for header, value in resp.getheaders(): resp_headers[header.lower()] = value return resp_headers def endpoint_trim(url, extension=None): s = urlparse.urlparse(url).netloc.split(':')[0] if extension is None: return s else: return s + '-' + extension class TrackConnection(Connection): def __init__(self, auth_url, user_name, key, logger, cache_file_path, object_store_url=None, auth_version="2", os_options=None, latency_log_interval=LATENCY_LOG_INTERVAL): socket.setdefaulttimeout(30.0) # timeout set at socket level Connection.__init__(self, auth_url, user_name, key, retries=0, os_options=os_options, auth_version=auth_version) # needed if keystone-get-token does not work self.object_store_url = object_store_url self.state = dict() for component_name in self.component_names(): self.state[component_name] = \ dict(current_state=component_states.unknown, reason='', metrics={}) self.uptime = OrderedDict() self.latency = dict() self.metric_data = [] self.latency_reset() self.logger = logger self.cache_file_path = cache_file_path self.loop_end_time = time.time() self.latency_log_interval = latency_log_interval self.last_latency_logged = 0 # Will trigger immediate log @staticmethod def component_names(): return (COMPONENT_KEYSTONE_GET_TOKEN, COMPONENT_REST_API, COMPONENT_HEALTHCHECK_API) def metric_data_reset(self): self.metric_data = [] def dump_metric_data(self): self.logger.debug("PRINTING CONTENTS OF METRIC DATA") # Read the component state metrics and append them to metrics_data dict for component_name in self.component_names(): metric = self.state[component_name]['metrics'].copy() # Time is now rather than the time we moved to this state metric['timestamp'] = timestamp() self.metric_data.append(metric) self.logger.debug(self.metric_data) dump_swiftlm_uptime_data(self.metric_data, self.cache_file_path, self.logger) def uptime_record(self, avail_percentage): _now = time.time() _avail_period = 60 _avail_epoch = 60 * 60 * 24 # Keeping at least two records so the first is not the last, and at # most the last _avail_epoch (24 hours) of uptime records if len(self.uptime) > 1: # First record in uptime data _first_record = self.uptime.keys()[0] # Last record in uptime data _last_record = self.uptime.keys()[-1] # Record a value every minute if (_now - _last_record) >= _avail_period: # If the time from the last loop is more than twice the # availability period of one minute, backfill the # uptime data with last known avail_percentage. if (_now - _last_record) >= (2 * _avail_period): for _ in range(int((_now - _last_record) // _avail_period)): self.uptime[(self.uptime.keys()[-1] + _avail_period)] = avail_percentage self.uptime[_now] = avail_percentage else: # This is the first entry in the uptime data and the endpoints # are taking longer than _avail_period to respond, so backfill # with last known avail_percentage. if self.loop_end_time <= (_now - _avail_period): self.uptime[self.loop_end_time] = avail_percentage # Backfill uptime data from last loop end time in # increments of one minute. for _ in range(int((_now - self.loop_end_time) // _avail_period)): self.uptime[(self.uptime.keys()[-1] + _avail_period)] = avail_percentage self.uptime[_now] = avail_percentage # Purge the oldest record(s) in the uptime data if older # than _avail_epoch. while True: if self.uptime.keys()[0] <= (_now - _avail_epoch): self.uptime.popitem(last=False) else: break # The logging data is reversed because the string size truncates # and we won't see updates to the end of the data self.logger.debug("+++++++++++++++ Uptime Data %s" % OrderedDict(sorted(self.uptime.items(), key=lambda t: t[0], reverse=True))) def emit_avail_metrics(self): # Total of all values in uptime dictionary _total = 0 # Report the most recent minute in the log _avail_minute = self.uptime.values()[-1] # Report the most recent timestamp in the log _avail_timestamp = int(round(self.uptime.keys()[-1])) # Report the average of all minutes for the last 24 hours for k, v in self.uptime.items(): _total += v _avail_day = float(_total / len(self.uptime)) dimensions = common_dimensions.copy() dimensions['component'] = COMPONENT_REST_API target_name = urlparse.urlparse(self.object_store_url).hostname dimensions['url'] = self.object_store_url dimensions['hostname'] = '_' self.metric_data.append( dict(metric=AVAIL_MINUTE, value=_avail_minute, dimensions=dimensions, timestamp=_avail_timestamp)) self.metric_data.append( dict(metric=AVAIL_DAY, value=_avail_day, dimensions=dimensions, timestamp=_avail_timestamp)) def latency_reset(self): for component_name in self.component_names(): self.latency[(component_name, 'num-samples')] = 0 self.latency[(component_name, 'total-time')] = 0 # We have to seed the minimum with a non-zero value self.latency[(component_name, 'min-latency')] = None self.latency[(component_name, 'max-latency')] = 0 def latency_record(self, component_name, duration): if not self.latency[(component_name, 'min-latency')]: self.latency[(component_name, 'min-latency')] = duration if duration < self.latency[(component_name, 'min-latency')]: self.latency[(component_name, 'min-latency')] = duration if duration > self.latency[(component_name, 'max-latency')]: self.latency[(component_name, 'max-latency')] = duration self.latency[(component_name, 'num-samples')] += 1 self.latency[(component_name, 'total-time')] += duration def latency_write_log(self): now = time.time() for component_name in self.component_names(): if self.latency[(component_name, 'total-time')] == 0: self.logger.info('No latency data for %s' % component_name) continue if self.latency[(component_name, 'min-latency')] is None: self.latency[(component_name, 'min-latency')] = 0.0 min_latency = self.latency[(component_name, 'min-latency')] max_latency = self.latency[(component_name, 'max-latency')] avg_latency = float(self.latency[(component_name, 'total-time')] / self.latency[(component_name, 'num-samples')]) if (now - self.last_latency_logged) >= LATENCY_LOG_INTERVAL: self.logger.info('Metric:%s:min-latency: %s (at %s)' % (component_name, min_latency, now)) self.logger.info('Metric:%s:max-latency: %s (at %s)' % (component_name, max_latency, now)) self.logger.info('Metric:%s:avg-latency: %s (at %s)' % (component_name, avg_latency, now)) self.last_latency_logged = now dimensions = common_dimensions.copy() dimensions['component'] = component_name if component_name == COMPONENT_KEYSTONE_GET_TOKEN: target_url = self.authurl else: target_url = self.object_store_url target_name = urlparse.urlparse(target_url).hostname dimensions['url'] = target_url dimensions['hostname'] = '_' self.metric_data.append( dict(metric=MIN_LATENCY, value=min_latency, dimensions=dimensions, timestamp=timestamp())) self.metric_data.append( dict(metric=MAX_LATENCY, value=max_latency, dimensions=dimensions, timestamp=timestamp())) self.metric_data.append( dict(metric=AVG_LATENCY, value=avg_latency, dimensions=dimensions, timestamp=timestamp())) def record_state(self, component_name, new_state, reason): self.logger.debug(" ++++++++++++++ record_state called %s %s %s" % (component_name, new_state, reason)) # Pick out the name (drop https://, port and path) k = component_name if k in self.state.keys(): old_state = self.state[component_name]['current_state'] else: old_state = component_states.unknown state_details = dict(current_state=old_state, reason='', metrics={}) self.state[component_name] = state_details if old_state != new_state: now = time.time() state_details = self.state[component_name] state_details['current_state'] = new_state state_details['reason'] = reason dimensions = common_dimensions.copy() dimensions['component'] = component_name if component_name == COMPONENT_KEYSTONE_GET_TOKEN: target_url = self.authurl else: target_url = self.object_store_url target_name = urlparse.urlparse(target_url).hostname dimensions['url'] = target_url dimensions['hostname'] = '_' # Report state as 0 for ok, 1 for warn, 2 for fail, and 3 for # unknown. Monasca alarm expressions support only numerical # values hence report numbers for state so that we can configure # alarms in monasca if new_state == component_states.fail: state_details['metrics'] = \ dict(metric=SWIFT_STATE, value=STATE_VALUE_FAIL, dimensions=dimensions, timestamp=timestamp(), value_meta={'msg': str(reason).strip('\n')}) elif new_state == component_states.ok: state_details['metrics'] = \ dict(metric=SWIFT_STATE, value=STATE_VALUE_OK, dimensions=dimensions, timestamp=timestamp()) else: # this condition may not exist, still adding it to # be complete state_details['metrics'] = \ dict(metric=SWIFT_STATE, value=STATE_VALUE_UNKNOWN, dimensions=dimensions, timestamp=timestamp()) message = '{0} {1} {2}'.format(component_name, new_state, reason) # We log all state transitions at INFO level self.logger.info('State-change:{component_name}: {new_state}' ' (at {now}): {reason}'.format( **{'component_name': component_name, 'new_state': new_state, 'now': now, 'reason': message.replace('\n', ' ')})) def check_keystone_get_token(self): # Ask keystone for a token retries = 1 # don't try too hard attempts = 0 component = COMPONENT_KEYSTONE_GET_TOKEN component_state = component_states.ok reason = 'success' while attempts <= retries: start_time = time.time() try: self.logger.debug('Doing GET AUTH') start_time = time.time() self.url, token = self.get_auth() duration = time.time() - start_time self.latency_record(component, duration) self.logger.debug('SUCCESS; token: %s in %s' % (self.token, duration)) self.token = token component_state = component_states.ok reason = 'success' except (socket.error, HTTPException, ClientException, ConnectionError) as err: duration = time.time() - start_time self.latency_record(component, duration) self.record_state(component, component_states.fail, err) component_state = component_states.fail reason = err if component_state == component_states.fail: time.sleep(1) attempts += 1 else: attempts = retries + 1 # break out of loop self.record_state(component, component_state, reason) return component_state def check_object_store(self): # If no token then cannot perform request if not self.token: self.record_state(COMPONENT_REST_API, component_states.fail, 'Authentication failed') return retries = 3 attempts = 0 tinyobj_contents = str(uuid.uuid4()) # Create random contents randint = random.randint(0, 500) tinyobj_name = 'tinyobj-' + str(randint) + '-' + socket.gethostname() component_state = component_states.ok reason = 'success' while attempts <= retries: obj_start_time = time.time() start_time = obj_start_time try: self.http_conn = None # Force new connection self.logger.debug('Doing OBJECT PUT/GET/DELETE') obj_start_time = time.time() start_time = obj_start_time self.head_account() duration = time.time() - start_time self.logger.debug('head-account ok in %s' % duration) start_time = time.time() self.put_container('swift_monitor_latency_test') duration = time.time() - start_time self.logger.debug('put-container ok in %s' % duration) start_time = time.time() self.put_object('swift_monitor_latency_test', tinyobj_name, tinyobj_contents) duration = time.time() - start_time self.logger.debug('put-object ok in %s' % duration) start_time = time.time() headers, body = self.get_object('swift_monitor_latency_test', tinyobj_name, resp_chunk_size=65536) chunks = [] for chunk in body: self.logger.debug('get-object chunk: %s' % chunk) chunks.append(chunk) duration = time.time() - start_time self.logger.debug('get-object ok in %s' % duration) start_time = time.time() self.delete_object('swift_monitor_latency_test', tinyobj_name) duration = time.time() - start_time self.logger.debug('delete-object ok in %s' % duration) duration = time.time() - obj_start_time self.latency_record(COMPONENT_REST_API, duration) component_state = component_states.ok reason = 'success' except (socket.error, HTTPException, ClientException, ConnectionError) as err: duration = time.time() - start_time self.latency_record(COMPONENT_REST_API, duration) self.record_state(COMPONENT_REST_API, component_states.fail, err) component_state = component_states.fail reason = err if component_state == component_states.fail: time.sleep(1) attempts += 1 else: attempts = retries + 1 # break out of loop self.record_state(COMPONENT_REST_API, component_state, reason) return component_state def check_object_store_health_check(self, logger): component = COMPONENT_HEALTHCHECK_API retries = 3 attempts = 0 component_state = component_states.ok reason = 'success' while attempts <= retries: start_time = time.time() try: self.http_conn = None # Force new connection logger.debug('Doing GET /healthcheck') start_time = time.time() health_check(self.object_store_url, self.logger) duration = time.time() - start_time self.latency_record(component, duration) logger.debug('Ok in %s' % duration) component_state = component_states.ok reason = 'success' except (socket.error, HTTPException, ClientException, RequestException, ConnectionError) as err: duration = time.time() - start_time self.latency_record(component, duration) self.record_state(component, component_states.fail, err) component_state = component_states.fail reason = err if component_state == component_states.fail: time.sleep(1) attempts += 1 else: attempts = retries + 1 # break out of loop self.record_state(component, component_state, reason) return component_state def main_loop(parsed_arguments, logger): args_to_log = [(key, value) for key, value in parsed_arguments.items() if key != 'password'] logger.info('Starting swiftlm uptime monitor service with following' ' parameters:' '%s' % args_to_log) os_options = {} if parsed_arguments['region']: os_options['region'] = parsed_arguments['region'] if parsed_arguments['project_id']: os_options['project_id'] = parsed_arguments['project_id'] elif parsed_arguments['project_name']: os_options['project_name'] = parsed_arguments['project_name'] if parsed_arguments['endpoint_type']: os_options['endpoint_type'] = parsed_arguments['endpoint_type'] if parsed_arguments['project_domain_name']: os_options['project_domain_name'] = \ parsed_arguments['project_domain_name'] if parsed_arguments['user_domain_name']: os_options['user_domain_name'] = parsed_arguments['user_domain_name'] conn = TrackConnection(parsed_arguments['keystone_auth_url'], parsed_arguments['user_name'], parsed_arguments['password'], logger, parsed_arguments['cache_file_path'], object_store_url=parsed_arguments[ 'object_store_url'], auth_version=parsed_arguments['auth_version'], os_options=os_options, latency_log_interval=parsed_arguments[ 'latencyLogInterval']) common_dimensions['observer_host'] = socket.gethostname() common_dimensions['service'] = SERVICE_NAME if parsed_arguments['region']: common_dimensions['region'] = parsed_arguments['region'] # Get into sync with wake up time time.sleep(sleep_interval(parsed_arguments['main_loop_interval'], conn.loop_end_time, WAKE_UP_SECOND)) conn.loop_end_time = time.time() while True: conn.latency_reset() conn.metric_data_reset() if conn.user != "None": service_state = conn.check_keystone_get_token() if conn.user != "None": for proxy in range(0, parsed_arguments['objectChecksPerInterval']): service_state = conn.check_object_store() if service_state != component_states.ok: break if service_state == component_states.ok: conn.uptime_record(SWIFT_UP) else: conn.uptime_record(SWIFT_DOWN) for proxy in range(0, parsed_arguments['objectChecksPerInterval']): service_state = conn.check_object_store_health_check(logger) if service_state != component_states.ok: break conn.latency_write_log() conn.emit_avail_metrics() conn.dump_metric_data() conn.loop_end_time = time.time() # Sleep until next wake up time.sleep(sleep_interval(parsed_arguments['main_loop_interval'], conn.loop_end_time, WAKE_UP_SECOND)) usage_message = """ OVERVIEW This program performs a set of operations as follows at each cycle: 1/ Gets a token from the Keystone service (keystone-get-token) 2/ Performs a number of HEAD operations against the account using this token and the Swift URL/account (rest-api) If we cannot get a token from the keystone service, we continue to use the existing token. If this becomes invalid, the rest-api is marked failed. 3/ Performs a number of healthcheck-api operations against the Swift endpoint (healthcheck-api) For rest-api and healthcheck-api operations, they are repeated many times each cycle. The idea is to make the load balancers round robin us through all the proxies. See checks_per_interval below. Writes to syslog LOG_LOCAL0 facility as follows: - A record of each transition from ok to failure and vice vera Records look something like: 20120418-20:00.33 1334779233.48 hard rest-api failed ECONNREFUSED 20120418-20:00.43 1334779243.50 hard rest-api ok success Fields as follows: 20120418-20:00.33 - human readable date 1334779233.48 - timestamp in seconds ok/fail - means the component failed even after several retries If this is a temporary glitch, you should expect to see an ok within a few seconds If not, then expect to see fail after retries are exhausted rest-api - The name of the component. The names of all components are as follows: keystone-get-token -- the AUTH service rest-api -- Swift access using a token healthcheck-api -- Swift healthcheck-api (no token) ECONNREFUSED - on failures, records the immediate cause - Each access is timed. At each cycle the latency data is written to a /var/cache/swift/swiftlm_uptime_monitor/uptime.stats file and at every Nth cycle latency data is written to syslog LOG_LOCAL0 facility. The data includes the average and max latency over all the requests made against a service for the cycle. The records look lke: 2012/04/19 14:58:01 UTC metric:rest-api:min-latency: 0.0120283740000 2012/04/19 14:58:01 UTC metric:rest-api:max-latency: 0.0152399539948 2012/04/19 14:58:01 UTC metric:rest-api:avg-latency: 0.0136341639927 The unit at the end is seconds. We measure latency of keystone-get-token, rest-api and healthcheck-api components. FILES The configuration file is specifed using the -c/--config option. This files looks something like: [logging] # You can specify default log routing here if you want: log_level = info log_facility = LOG_LOCAL0 log_format = '%(name)s - %(levelname)s : %(message)s' [latency_monitor] # Time between each cycle interval:60 # Number of operations to make each cycle. Make this larger # than number of proxy servers checks_per_interval:70 #The file path where the uptime stats are written cache_file_path: /var/cache/swift/swiftlm_uptime_monitor/uptime.stats #You must speicfy both keystone_auth_url and object_store_url # object_store_url is requried for performing healtchcheck operation keystone_auth_url: https://region-1.identity.my.com:35357/v2.0/ object_store_url: https://region-1.objects.my.com/v1.0/ endpoint_type: internalURL # project and user domain names project_domain_name = Default user_domain_name = Default # Credential information -- uses Keystone V2 format user_name: my.address@example.com password: whatever # If both project_name and project_id are specified, project_id is considered for authentication project-id: 12345678912345 project_name: myproject auth_version:2 # if not specified defaults to 2 """ def parse_args(args): parser = argparse.ArgumentParser( description='swift-uptime-mon ' '--config CONFIGFILE', usage=usage_message) parser.add_argument('-c', '--config', dest='configFile', default='/etc/swift/swiftlm-uptime-monitor.conf') return parser.parse_args(args) def validate_args(config): if not config.has_section('logging'): raise UPtimeMonException( "Please specify logging options in config file") if not config.has_section('latency_monitor'): raise UPtimeMonException( "Please specify latency_monitor options in config file") logger = get_logger(dict(config.items('logging')), name='uptime-mon') parsed_arguments = {} try: parsed_arguments['object_store_url'] = config.get('latency_monitor', 'object_store_url') except ConfigParser.NoOptionError: msg = "Please provide Object Store URL, Quitting swift uptime mon" logger.exception(msg) raise UPtimeMonException(msg) try: parsed_arguments['endpoint_type'] = config.get('latency_monitor', 'endpoint_type') except ConfigParser.NoOptionError: parsed_arguments['endpoint_type'] = getenv('OS_ENDPOINT_TYPE') try: parsed_arguments['project_domain_name'] = config.get( 'latency_monitor', 'project_domain_name') except ConfigParser.NoOptionError: parsed_arguments['project_domain_name'] = \ getenv('OS_PROJECT_DOMAIN_NAME') try: parsed_arguments['user_domain_name'] = config.get('latency_monitor', 'user_domain_name') except ConfigParser.NoOptionError: parsed_arguments['user_domain_name'] = getenv('OS_USER_DOMAIN_NAME') try: parsed_arguments['main_loop_interval'] = int( config.get('latency_monitor', 'interval')) except ConfigParser.NoOptionError: parsed_arguments['main_loop_interval'] = 60 try: parsed_arguments['objectChecksPerInterval'] = int( config.get('latency_monitor', 'checks_per_interval')) except ConfigParser.NoOptionError: parsed_arguments['objectChecksPerInterval'] = 40 try: parsed_arguments['latencyLogInterval'] = int( config.get('latency_monitor', 'latency_log_interval')) except ConfigParser.NoOptionError: parsed_arguments['latencyLogInterval'] = LATENCY_LOG_INTERVAL try: parsed_arguments['keystone_auth_url'] = config.get('latency_monitor', 'keystone_auth_url') except ConfigParser.NoOptionError: parsed_arguments['keystone_auth_url'] = getenv('OS_AUTH_URL') try: parsed_arguments['user_name'] = config.get('latency_monitor', 'user_name') except ConfigParser.NoOptionError: parsed_arguments['user_name'] = getenv('OS_USERNAME') try: parsed_arguments['password'] = config.get('latency_monitor', 'password') except ConfigParser.NoOptionError: parsed_arguments['password'] = getenv('OS_PASSWORD') try: parsed_arguments['auth_version'] = config.get('latency_monitor', 'auth_version') except ConfigParser.NoOptionError: # If auth version is not specified default it to 2.0 parsed_arguments['auth_version'] = '2.0' try: parsed_arguments['project_id'] = config.get('latency_monitor', 'project_id') except ConfigParser.NoOptionError: parsed_arguments['project_id'] = getenv('OS_PROJECT_ID') try: parsed_arguments['project_name'] = config.get('latency_monitor', 'project_name') except ConfigParser.NoOptionError: parsed_arguments['project_name'] = getenv('OS_PROJECT_NAME') try: parsed_arguments['region'] = config.get('latency_monitor', 'region') except ConfigParser.NoOptionError: parsed_arguments['region'] = getenv('OS_REGION') try: parsed_arguments['cache_file_path'] = config.get('latency_monitor', 'cache_file_path') except ConfigParser.NoOptionError: msg = "Please specify cache file path, Quitting swift uptime mon" logger.exception(msg) raise UPtimeMonException(msg) if parsed_arguments['user_name'] is None: message = "User name is mandatory, Quitting swift uptime mon" logger.exception(message) raise UPtimeMonException(message) if parsed_arguments['password'] is None: message = "Password is mandatory, Quitting swift uptime mon" logger.exception(message) raise UPtimeMonException(message) if parsed_arguments['keystone_auth_url'] is None: message = ("Keystone Auth URL is mandatory, " "Unable to get this URL from OS_AUTH_URL " "environment variable, Quitting swift uptime mon") logger.exception(message) raise UPtimeMonException(message) if parsed_arguments['project_id'] is None and \ parsed_arguments['project_name'] is None: message = ("One of project_name or project_id must be specified, " "Quitting swift uptime mon") logger.exception(message) raise UPtimeMonException(message) return parsed_arguments, logger def main(): args = parse_args(sys.argv[1:]) # Ensure configuration file is present if not path.isfile(args.configFile): message = "Config file not present, Quitting swift uptime mon" raise Exception(message) config = ConfigParser.RawConfigParser() config.read(args.configFile) parsed_arguments, logger = validate_args(config) # if both project_id and project_name are specified use project_id if parsed_arguments['project_id']: if parsed_arguments['project_name']: del parsed_arguments['project_name'] main_loop(parsed_arguments, logger) exit(0) if __name__ == "__main__": main() 0707010000003B000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000004300000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/generic_hardware0707010000003C000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004F00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/generic_hardware/__init__.py0707010000003D000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003E00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/hp_hardware0707010000003E000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/hp_hardware/__init__.py0707010000003F000081A4000003E800000064000000015BE06E0300004F9D000000000000000000000000000000000000004800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/hp_hardware/ssacli.py # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # # Python library for running ssacli commnads import re try: import configparser except ImportError: import ConfigParser as configparser from collections import OrderedDict from swiftlm.utils.metricdata import MetricData from swiftlm.utils.values import Severity from swiftlm.utils.utility import run_cmd from swiftlm import CONFIG_FILE LOCK_FILE_COMMAND = '/usr/bin/flock -w 10 /var/lock/ssacli-swiftlm.lock ' BASE_RESULT = MetricData( name=__name__, messages={ 'no_battery': 'No cache battery', 'unknown': 'ssacli command failed', 'controller_status': '{sub_component} status is {status}', 'in_hba_mode': 'Controller is in HBA mode; performance will be poor', 'physical_drive': 'Drive {serial_number}: ' '{box}:{bay} has status: {status}', 'l_drive': 'Logical Drive {logical_drive} has status: {status}', 'l_cache': 'Logical Drive {logical_drive} has cache status: {caching}', 'ok': 'OK', 'fail': 'FAIL', } ) # This is all the data we are looking for in the ssacli output so we # will _only_ gather whatever values are in this list METRIC_KEYS = ['array', 'physicaldrive', 'logical_drive', 'caching', 'serial_number', 'slot', 'firmware_version', 'controller_mode', 'battery_capacitor_presence', 'battery_capacitor_status', 'controller_status', 'cache_status', 'box', 'bay', 'status', 'ld acceleration method'] def indent_at(line): indent = 0 for char in line: if char.isspace(): indent += 1 else: break return indent class TextBlock(object): """ Structure to represent the scanning that TextScanner performs. The members are as follows: text: A line of text. This can be a title or attribute/value pair subblocks: TextBlock objects that are "under" (indented) under the above line. """ def __init__(self, text): self.text = text self.subblocks = [] def make_subblock(self, text): subblock = TextBlock(text) self.subblocks.append(subblock) return subblock class TextScanner(object): """ Scans blocks of text This scanner processes column-aligned text into a block/subblock structure. The result is the TextBlock class, where each block is a line of text (title or attribute). Any text indented under the line is a listed in the subblocks member (which in turn is another TextBlock object. """ def __init__(self, lines): self.line_index = None self.root_block = TextBlock('root') self.scan_text_blocks(lines, -1, self.root_block) def scan_text_blocks(self, lines, indent, block): ''' Scan blocks of text recursively Normally each line is scanned once (incrementing self.line_index) However, if we're called and realise the line is an outer block (because indent is less than current), we decrement line_index so that we reprocess that line. :param lines: an array of lines of text :param indent: the current indentation level :param block: the outer block ''' if self.line_index is None: self.line_index = 0 while self.line_index < len(lines): line = lines[self.line_index] self.line_index += 1 if not line or line.isspace(): continue if indent_at(line) <= indent: # Text is outer block -- out block has ended self.line_index += -1 return # Text at same level as our peers -- save the text subblock = block.make_subblock(line.strip()) # Look for more lines self.scan_text_blocks(lines, indent_at(line), subblock) def get_root_block(self): return self.root_block def parse_array_name(text): return text.split()[0], text.split()[1].strip(), text def parse_controller_name(text): model = text.strip().split("in Slot")[0].strip() controller_key = text.strip().split("(Embedded)")[0].strip() return model, controller_key def parse_ld_name(text): return parse_attribute(text, underscoring=False) def parse_attribute(text, underscoring=True): try: k, v = text.split(': ', 1) except ValueError: raise k = k.strip().lower() if underscoring: k = re.sub('[/|() ]', '_', k) return k, v.strip() def parse_cont_attribute(attribute, info, slots=None): try: k, v = parse_attribute(attribute.text) except ValueError: raise if 'slot' in k: slots.append(v) if 'battery_capacitor_count' in k: k = 'battery_capacitor_presence' if any(k in s for s in METRIC_KEYS): info.update({k: v}) def get_smart_array_info(): """ Function entry point used by cinderlm """ return get_controller_info() def get_controller_info(): """ parses controller data from ssacli in the form. returns a dict. key's are lowercased versions of the key name on each line, including special characters. Values are not changed. keys 'model' and 'slot' are parsed from the first line Smart Array P410 in Slot 1 Bus Interface: PCI Slot: 1 Serial Number: PACCR0M9VZ41S4Q Cache Serial Number: PACCQID12061TTQ RAID 6 (ADG) Status: Disabled Controller Status: OK Hardware Revision: C Firmware Version: 6.60 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Surface Scan Mode: Idle Queue Depth: Automatic Monitor and Performance Delay: 60 min Elevator Sort: Enabled Degraded Performance Optimization: Disabled Inconsistency Repair Policy: Disabled Wait for Cache Room: Disabled Surface Analysis Inconsistency Notification: Disabled Post Prompt Timeout: 15 secs Cache Board Present: True Cache Status: OK Cache Ratio: 25% Read / 75% Write Drive Write Cache: Disabled Total Cache Size: 256 MB Total Cache Memory Available: 144 MB No-Battery Write Cache: Disabled Cache Backup Power Source: Batteries Battery/Capacitor Count: 1 Battery/Capacitor Status: OK SATA NCQ Supported: True Number of Ports: 2 Internal only Encryption Supported: False Driver Name: hpsa Driver Version: 3.4.0 Driver Supports HP SSD Smart Path: False Smart Array P440ar in Slot 0 (Embedded) (HBA Mode) Bus Interface: PCI Slot: 0 Serial Number: PDNLH0BRH7V7GC Cache Serial Number: PDNLH0BRH7V7GC Controller Status: OK Hardware Revision: B Firmware Version: 2.14 Controller Temperature (C): 50 Number of Ports: 2 Internal only Driver Name: hpsa Driver Version: 3.4.4 HBA Mode Enabled: True PCI Address (Domain:Bus:Device.Function): 0000:03:00.0 Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s) Controller Mode: HBA Controller Mode Reboot: Not Required Current Power Mode: MaxPerformance Host Serial Number: MXQ51906YF """ results = [] controller_slots = [] controller_result = BASE_RESULT.child() controller_result.name += '.' + 'smart_array' rc = run_cmd(LOCK_FILE_COMMAND + 'ssacli ctrl all show detail') if rc.exitcode != 0: if 'Error: No controllers detected.' in str(rc.output): return [[], []] if len(rc.output) > 1847: rc = rc._replace(exitcode=rc.exitcode, output='...' + rc.output[-1844:]) raise Exception('{0}: ssacli ctrl all show detail ' 'failed with exit code: {1}'.format( rc.output, rc.exitcode)) if rc.output: lines = rc.output.split('\n') else: raise Exception('{0}: ssacli ctrl all show detail ' 'failed with exit code: {1}'.format( rc.output, rc.exitcode)) info = [] text_scanner = TextScanner(lines) root = text_scanner.get_root_block() c_info = None # Extract controller information for controller in root.subblocks: line = controller.text if line.startswith('Smart Array') or line.startswith('Smart HBA'): model, _ = parse_controller_name(line) c_info = {'model': model} info.append(c_info) # Process controller attributes for attribute in controller.subblocks: parse_cont_attribute(attribute, c_info, controller_slots) elif line.startswith('CACHE STATUS'): for attribute in controller.subblocks: # Process controller attributes att = attribute.text if ': ' in att and c_info: parse_cont_attribute(attribute, c_info, controller_slots) else: # Unknown controller type continue # Walk dictionary to gather controller metrics for c_info in info: results.extend(check_controller(c_info, controller_result)) return results, controller_slots def check_controller(c, base): results = [] base = base.child(dimensions={ 'model': c.get('model', 'NA'), 'controller_slot': c.get('slot', 'NA'), 'component': 'controller', }) # Firmware version try: f = c.get('firmware_version', '0') f = float(f) except ValueError: f = 0 r = base.child() r.name += '.' + 'firmware' r.value = f results.append(r) # Check for HBA mode try: hba_mode = c.get('controller_mode', 'not-hba') except ValueError: hba_mode = 'not-hba' r = base.child() r['sub_component'] = 'controller_not_hba_mode' if hba_mode == 'HBA': r.value = Severity.fail r.message = 'in_hba_mode' results.append(r) return results # no point in looking at cache, battery, etc. else: r.value = Severity.ok results.append(r) # Battery presence try: bcp = c.get('battery_capacitor_presence', '0') bcp = int(bcp) except ValueError: bcp = 0 r = base.child() r['sub_component'] = 'battery_capacitor_presence' if bcp < 1: r.value = Severity.fail r.message = 'no_battery' else: r.value = Severity.ok results.append(r) # Statuses for i in ('controller_status', 'cache_status', 'battery_capacitor_status'): s = c.get(i, 'NA') r = base.child() r['sub_component'] = i r.msgkey('status', s) if s != 'OK': r.value = Severity.fail r.message = 'controller_status' else: r.value = Severity.ok results.append(r) return results def get_physical_drive_info(controller_slot): """ Parses drive data from ssacli in the form. There are multiple drives in the output. Smart Array P410 in Slot 1 array A physicaldrive 2C:1:1 Port: 2C Box: 1 Bay: 1 Status: OK Drive Type: Data Drive Interface Type: SAS Size: 2 TB Native Block Size: 512 Rotational Speed: 7200 Firmware Revision: HPD3 Serial Number: YFJMHTZD Model: HP MB2000FBUCL Current Temperature (C): 27 Maximum Temperature (C): 38 PHY Count: 2 PHY Transfer Rate: 6.0Gbps, Unknown """ results = [] drive_result = BASE_RESULT.child(dimensions={ 'controller_slot': str(controller_slot) }) drive_result.name += '.physical_drive' rc = run_cmd( LOCK_FILE_COMMAND + 'ssacli ctrl slot=%s pd all show detail' % controller_slot) if rc.exitcode != 0: if len(rc.output) > 1847: rc = rc._replace(exitcode=rc.exitcode, output='...' + rc.output[-1844:]) raise Exception('{0}: ssacli ctrl slot={1} pd all show detail ' 'failed with exit code: {2}'.format( rc.output, controller_slot, rc.exitcode)) lines = rc.output.split('\n') if lines == ['']: raise Exception('{0}: ssacli ctrl slot={1} pd all show detail ' 'failed with exit code: {2}'.format( rc.output, controller_slot, rc.exitcode)) drive_info = [] text_scanner = TextScanner(lines) root = text_scanner.get_root_block() # Extract drive information for controller in root.subblocks: line = controller.text if line.startswith("Smart Array"): _, controller_key = parse_controller_name(line) for assignment in controller.subblocks: line = assignment.text if "array" in line: # drives assigned to a LUN pass elif "hba drives" in line.lower(): # controller in HBA mode pass elif "unassigned" in line.lower(): # Unassigned drives are probably unassigned for a reason # (such as failed) so we'll ignore them continue else: # Unrecognised assignment - ignore continue for pd in assignment.subblocks: # Parse drive attributes pd_data = {} for attribute in pd.subblocks: parse_cont_attribute(attribute, pd_data) drive_info.append(pd_data) # Now walk drive_info to get metrics' data from the controller(s), # array(s), physical drive(s), and logical drive(s) for pd_data in drive_info: results.extend(check_physical_drive(pd_data, drive_result)) return results def check_physical_drive(d, base): r = base.child(dimensions={ 'box': d.get('box', 'NA'), 'bay': d.get('bay', 'NA'), 'component': 'physical_drive'}, msgkeys={'status': d.get('status', 'NA'), 'serial_number': d.get( 'serial_number', 'NA')}) if d.get('status', 'NA') != 'OK': r.value = Severity.fail r.message = 'physical_drive' else: r.value = Severity.ok return [r] def get_logical_drive_info(controller_slot, cache_check=True): """ array L Logical Drive: 12 Size: 1.8 TB Fault Tolerance: 0 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Strip Size: 256 KB Full Stripe Size: 256 KB Status: OK Caching: Enabled Unique Identifier: 600508B1001CEA938043498011A76404 Disk Name: /dev/sdl Mount Points: /srv/node/disk11 1.8 TB Partition Number 2 OS Status: LOCKED Logical Drive Label: AF3C73D8PACCR0M9VZ41S4QEB69 Drive Type: Data LD Acceleration Method: Controller Cache """ results = [] drive_result = BASE_RESULT.child(dimensions={ 'controller_slot': controller_slot}) drive_result.name += '.' + 'logical_drive' rc = run_cmd( LOCK_FILE_COMMAND + 'ssacli ctrl slot=%s ld all show detail' % controller_slot) if rc.exitcode != 0: if len(rc.output) > 1847: rc = rc._replace(exitcode=rc.exitcode, output='...' + rc.output[-1844:]) raise Exception('{0}: ssacli ctrl slot={1} ld all show detail ' 'failed with exit code: {2}'.format( rc.output, controller_slot, rc.exitcode)) lines = rc.output.split('\n') if lines == ['']: raise Exception('{0}: ssacli ctrl slot={1} ld all show detail ' 'failed with exit code: {2}'.format( rc.output, controller_slot, rc.exitcode)) drive_info = [] text_scanner = TextScanner(lines) root = text_scanner.get_root_block() # Extract logical drive information for controller in root.subblocks: line = controller.text if line.startswith("Smart Array"): for array in controller.subblocks: line = array.text if "array" in line: _, array_letter, array_name = parse_array_name(line) for lun in array.subblocks: line = lun.text if "Logical Drive:" in line: logical_drive = line.strip() try: _, ld_num = parse_ld_name(line) ld_data = {'array': array_letter, 'logical_drive': ld_num} except ValueError: continue for attribute in lun.subblocks: line = attribute.text try: k, v = parse_attribute(line, underscoring=False) except ValueError: continue if any(k in s for s in METRIC_KEYS): ld_data.update({k: v}) drive_info.append(ld_data) # Now walk the LUNs and check them for ld_data in drive_info: results.extend(check_logical_drive(ld_data, drive_result, cache_check)) return results def check_logical_drive(d, base, cache_check): results = [] base = base.child(dimensions={ 'component': 'logical_drive', 'array': d.get('array', 'NA'), 'logical_drive': d.get('logical_drive', 'NA')}, msgkeys={'status': d.get('status', 'NA'), 'caching': d.get('caching', 'NA')}) r = base.child() r['sub_component'] = 'lun_status' if d.get('status', 'NA') != 'OK': r.value = Severity.fail r.message = 'l_drive' else: r.value = Severity.ok results.append(r) if cache_check: r = base.child() r['sub_component'] = 'cache_status' cache_must_be_enabled = True if d.get('ld acceleration method', 'NA') == 'HPE SSD Smart Path': cache_must_be_enabled = False if cache_must_be_enabled and d.get('caching', 'NA') != 'Enabled': r.value = Severity.fail r.message = 'l_cache' else: r.value = Severity.ok results.append(r) return results def main(): """Check controller and drive information with ssacli""" cache_check = True try: cp = configparser.RawConfigParser() cp.read(CONFIG_FILE) cc = cp.getboolean('ssacli', 'check_cache') if not cc: cache_check = False except Exception: pass results, controller_slots = get_controller_info() for controller_slot in controller_slots: results.extend(get_physical_drive_info(controller_slot)) results.extend(get_logical_drive_info(controller_slot, cache_check=cache_check)) return results 07070100000040000041ED000003E800000064000000045BE06E0300000000000000000000000000000000000000000000003A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca07070100000041000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004600000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/__init__.py07070100000042000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000004800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/check_plugins07070100000043000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000005400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/check_plugins/__init__.py07070100000044000081A4000003E800000064000000015BE06E03000043B4000000000000000000000000000000000000005900000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/check_plugins/swiftlm_check.py# (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # # In case you are tempted to import from non-built-in libraries, think twice: # this module will be imported by monasca-agent which must therefore be able # to import any dependent modules. from collections import defaultdict import json import pkg_resources try: from monasca_agent.collector.checks import AgentCheck as AgentCheck except IOError: # Unit tests do no have an /etc/monasca/agent/agent.yaml file class AgentCheck(object): def __init__(self, name, init_config, agent_config, instances): self.log = None import socket import subprocess import threading # This module provides a check plugin class for monasca-agent. The plugin # runs 'tasks' each of which generates one or more metrics that are reported to # the monasca-agent daemon. There are three types of tasks: # # 1. Load metrics from files: file names may be specified via an ansible # playbook task that deploys the associated swiftlm_detect detect plugin. Files # should contain json encoded lists of metric dicts. # # 2. Run a swiftlm-scan command line. # # 3. Run a python function found from a list of entry points. # # If any task times out or raises an exception then the check plugin itself # will report a metric to that effect. import fcntl import errno import time import os OK = 0 WARN = 1 FAIL = 2 UNKNOWN = 3 # name used for metrics reported directly by this module e.g. when a task # fails or times out. (we need to hard code this name rather than use the # module name because the module name reported by __name__ is dependant on how # monasca-agent imports the module) MODULE_METRIC_NAME = 'swiftlm.swiftlm_check' # Directory to which Monasca Agent can write POSTED_DIR = '/tmp' # Assumes metrics file written every 60 seconds METRIC_STALE_AGE = 60 * 4 # These are too old to report POSTED_STALE_AGE = METRIC_STALE_AGE * 2 # Keep in posted file until this old def _take_shared_lock(fd): # attempt to take a shared lock on fd, raising IOError if # lock cannot be taken after a number of attempts max_attempts = 5 delay = 0.02 attempts = 0 while True: attempts += 1 try: fcntl.flock(fd, fcntl.LOCK_SH | fcntl.LOCK_NB) break except IOError as err: if (err.errno != errno.EWOULDBLOCK or attempts > max_attempts): raise time.sleep(delay * attempts) class CommandRunner(object): def __init__(self, command): self.command = command self.stderr = self.stdout = self.returncode = self.exception = None self.timed_out = False self.process = None def run_with_timeout(self, timeout): thread = threading.Thread(target=self.run_subprocess) thread.start() thread.join(timeout) if thread.is_alive(): self.timed_out = True if self.process: self.process.terminate() def run_subprocess(self): try: self.process = subprocess.Popen( self.command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) self.stdout, self.stderr = self.process.communicate() self.returncode = self.process.returncode except Exception as e: # noqa self.exception = e class SwiftLMScan(AgentCheck): # set of check tasks implemented by swiftlm TASKS = ( 'replication', 'file-ownership', 'drive-audit', 'swift-services', 'check-mounts', 'connectivity', 'system', 'check-mounts', 'ssacli' ) # we explicitly list the entry points to be called rather than just use # whatever is found via pkg_resources.get_entry_map() because (a) we'd like # to know if an entry point is missing and (b) we may have experimental # swiftlm entry points installed that should not just automatically run # in this plugin. TASK_ENTRY_POINTS = [] # uncomment if entry points are to be used... # TASK_ENTRY_POINTS = TASKS # command args to be used for all calls to shell commands COMMAND_ARGS = ['sudo', 'swiftlm-scan', '--format', 'json'] COMMAND_TIMEOUT = 15.0 SUBCOMMAND_PREFIX = '--' # list of sub-commands each of which is appended to a shell command # with the prefix added DEFAULT_SUBCOMMANDS = TASKS # list of tasks for which any 'ok' metrics will NOT be reported DEFAULT_SUPPRESS_OK = [] def __init__(self, name, init_config, agent_config, instances=None, logger=None): super(SwiftLMScan, self).__init__( name, init_config, agent_config, instances) self.log = logger or self.log self.plugin_tasks = {} self._load_plugin_tasks() self.plugin_failures = [] def _plugin_failed(self, typ, item, msg): self.plugin_failures.append('%s: %s: %s' % (typ, item, msg)) def _plugin_check_metric(self): """ Generate metric to report that a task has raised an exception. """ if self.plugin_failures: msg = ', '.join(self.plugin_failures) if len(msg) > 2047: msg = msg[:-3] + '...' return dict( metric=MODULE_METRIC_NAME, dimensions={'service': 'object-storage'}, value_meta=dict(msg=msg), value=FAIL) else: msg = 'Ok' return dict( metric=MODULE_METRIC_NAME, dimensions={'service': 'object-storage'}, value_meta=dict(msg=msg), value=OK) def _load_plugin_tasks(self): dist, group = 'swiftlm', 'swiftlm.plugins' try: self.plugin_tasks = pkg_resources.get_entry_map(dist, group) except pkg_resources.DistributionNotFound: self.log.warn('No plugins found for %s, %s' % (dist, group)) def log_summary(self, task_type, summary): task_count = len(summary.get('tasks', [])) if task_count == 1: msg = 'Ran 1 %s task.' % task_type else: msg = 'Ran %d %s tasks.' % (task_count, task_type) # suppress log noise if no tasks were configured logger = self.log.info if task_count else self.log.debug if summary: msg += ' Metrics summary: {' m = {'total': 'Total', OK: 'Ok', WARN: 'Warn', FAIL: 'Fail', UNKNOWN: 'Unknown'} for key in ('total', OK, WARN, FAIL, UNKNOWN): msg += '%s: %d, ' % (m[key], len(summary.get(key, []))) msg += '}' logger(msg) def _run_entry_point_task(self, task_name): # why is this here? If swiftlm entry points are available to # monasca-agent (e.g. installed in the venv) then we can call # the task entry points directly using this method. Otherwise, this # method is useful for testing - tests can call entry points with # mocked environments to verify the entry point behaviors. metrics = [] task_func = self.plugin_tasks.get(task_name) if task_func: try: task_func = task_func.load() results = task_func() if not isinstance(results, list): results = [results] for result in results: # dirty hack to allow swiftlm MetricData objects to # be consumed without importing from swiftlm if hasattr(result, 'metric'): metric = result.metric() metrics.append(metric) elif not isinstance(result, dict): self.log.warn( 'Unexpected return type "%s" from task "%s"' % (type(result).__name__, task_name)) except Exception as e: # noqa self.log.warn('Plugin task "%s" failed with "%s"' % (task_name, e)) self._plugin_failed('plugin', task_name, e) else: self._plugin_failed('plugin', task_name, 'missing entry point') return metrics def _run_command_line_task(self, task_name): # why is this here? If swiftlm entry points are not available to # monasca-agent then we have to call out to a command line that # can find the entry points. command = list(self.COMMAND_ARGS) command.append(self.SUBCOMMAND_PREFIX + task_name) cmd_str = ' '.join(command) runner = CommandRunner(command) metrics = [] try: runner.run_with_timeout(self.COMMAND_TIMEOUT) except Exception as e: # noqa self.log.warn('Command "%s" failed with "%s"' % (cmd_str, e)) self._plugin_failed('command', task_name, e) else: if runner.exception: self.log.warn('Command "%s" failed with "%s"' % (cmd_str, runner.exception)) self._plugin_failed('command', cmd_str, runner.exception) elif runner.timed_out: self.log.warn('Command "%s" timed out after %ss' % (cmd_str, self.COMMAND_TIMEOUT)) self._plugin_failed('command', cmd_str, 'timed out') elif runner.returncode: self.log.warn('Command "%s" failed with status %s' % (cmd_str, runner.returncode)) self._plugin_failed('command', cmd_str, '%s' % runner.returncode) else: try: metrics = json.loads(runner.stdout) except (ValueError, TypeError) as e: self.log.warn('Failed to parse json: %s' % e) self._plugin_failed('command', cmd_str, 'failed to parse' ' json') return metrics def _run_load_file_task(self, file_path): metrics = [] try: with open(file_path, 'r') as f: _take_shared_lock(f.fileno()) f.seek(0) metrics = json.load(f) except (ValueError, TypeError) as e: self.log.warn('Loading file "%s" failed parsing json: %s' % (file_path, e)) self._plugin_failed('file', file_path, 'failed parsing json: %s' % e) except Exception as e: # noqa self.log.warn('Loading file "%s" failed with "%s"' % (file_path, e)) self._plugin_failed('file', file_path, 'loading error: %s' % e) return self._remove_duplicate_metrics(metrics, file_path) def _remove_duplicate_metrics(self, metrics, file_path): """ Remove metrics if we've already reported them We track the metrics we return to the Monasca Agent in a "posted" file. This allows us to discard duplicate metrics. We also discard metrics that seem stale. This can occur when the program creating the metrics file has died, so the metrics file does not update with new metrics. The posted file contains recently posted metrics. The file is read, then re-writen on each cycle. Metrics older than POSTED_STALE_AGE are removed from the posted file (so it does not grow forever). :param metrics: The metrics we found in the metrics file :param file_path: the path of the metrics file -- this is used to derive an appropriate name for the posted file :returns: A list of metrics that should be posted """ stale_metrics = False posted_metrics = [] file_name = os.path.split(file_path)[1] pfile_path = os.path.join(POSTED_DIR, file_name) + '.posted' try: with open(pfile_path, 'r') as p: posted_metrics = json.load(p) except Exception as e: # noqa # This is normal when program first runs (no file exists) self.log.warn('Loading file "%s" failed: %s' % (pfile_path, e)) # Purge already posted and stale metrics for metric in list(metrics): if metric in posted_metrics: metrics.remove(metric) elif (time.time() - metric.get('timestamp')) > METRIC_STALE_AGE: metrics.remove(metric) stale_metrics = True else: posted_metrics.append(metric) # Purge really old metrics from posted file for metric in list(posted_metrics): if (time.time() - metric.get('timestamp')) > POSTED_STALE_AGE: posted_metrics.remove(metric) try: with open(pfile_path, 'w') as p: json.dump(posted_metrics, p) except Exception as e: # noqa self.log.warn('Dumping file "%s" failed: %s' % (pfile_path, e)) if stale_metrics: self.log.warn('Metrics are older than %s seconds;' ' file not updating?: %s ' % (METRIC_STALE_AGE, file_path)) self._plugin_failed('file', file_path, 'stale metrics') return metrics def _is_reported(self, task_name, metric): # filter out 'suppress_ok' metrics if task_name in self.suppress_ok and metric.get('value') == OK: return False return True def _get_metrics(self, task_names, task_runner): reported = [] summary = defaultdict(list) for task_name in task_names: summary['tasks'].append(task_name) metrics = task_runner(task_name) if not isinstance(metrics, list): metrics = [metrics] for metric in metrics: summary[metric.get('value')].append(metric) summary['total'].append(metric) if self._is_reported(task_name, metric): reported.append(metric) return reported, summary def _csv_to_list(self, csv): return [f.strip() for f in csv.split(',') if f] def _load_instance_config(self, instance): self.log.debug('instance config %s' % str(instance)) self.metrics_files = self._csv_to_list( instance.get('metrics_files', '')) self.log.debug('Using metrics files %s' % str(self.metrics_files)) if instance.get('subcommands') is None: self.subcommands = self.DEFAULT_SUBCOMMANDS else: self.subcommands = self._csv_to_list(instance.get('subcommands')) self.log.debug('Using subcommands %s' % str(self.subcommands)) if instance.get('suppress_ok') is None: self.suppress_ok = self.DEFAULT_SUPPRESS_OK else: self.suppress_ok = self._csv_to_list(instance.get('suppress_ok')) self.plugin_failures = [] def check(self, instance): self._load_instance_config(instance) # run entry point tasks all_metrics, summary = self._get_metrics( self.TASK_ENTRY_POINTS, self._run_entry_point_task) self.log_summary('entry point', summary) # run command line tasks metrics, summary = self._get_metrics( self.subcommands, self._run_command_line_task) self.log_summary('command', summary) all_metrics.extend(metrics) # run load file tasks metrics, summary = self._get_metrics( self.metrics_files, self._run_load_file_task) self.log_summary('load file', summary) all_metrics.extend(metrics) # plugin status all_metrics.extend([self._plugin_check_metric()]) for metric in all_metrics: # apply any instance dimensions that may be configured, # overriding any dimension with same key that check has set. metric['dimensions'] = self._set_dimensions(metric['dimensions'], instance) self.log.debug( 'metric %s %s %s %s %s' % (metric.get('timestamp'), metric.get('metric'), metric.get('value'), metric.get('value_meta'), metric.get('dimensions'))) try: self.gauge(**metric) except Exception as e: # noqa self.log.exception('Exception while reporting metric: %s' % e) 07070100000045000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000004900000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/detect_plugins07070100000046000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000005500000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/detect_plugins/__init__.py07070100000047000081A4000003E800000064000000015BE06E0300000902000000000000000000000000000000000000005B00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/monasca/detect_plugins/swiftlm_detect.py# (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from monasca_setup import agent_config import monasca_setup.detection from monasca_setup.detection.utils import _get_dimensions import os class SwiftLMDetect(monasca_setup.detection.ArgsPlugin): """ Detect if swiftlm-scan will be monitoring. """ SWIFTLM_DIR = '/etc/swiftlm' CHECK_NAME = 'swiftlm_check' def __init__(self, template_dir, overwrite=True, args=None): super(SwiftLMDetect, self).__init__( template_dir, overwrite, args) def _detect(self): """ Run detection, set self.available True if any swift config is detected. (Called during superclass __init__). """ self.available = False conf_file = 'swiftlm-scan.conf' if os.path.isfile(os.path.join(self.SWIFTLM_DIR, conf_file)): self.available = True def build_config(self): """ Build the config as a Plugins object and return. """ config = agent_config.Plugins() parameters = {'name': self.CHECK_NAME} if self.args: for arg in ('metrics_files', 'subcommands', 'suppress_ok'): if arg in self.args: parameters[arg] = self.args.get(arg) # set service and component dimensions = _get_dimensions('object-storage', None) if len(dimensions) > 0: parameters['dimensions'] = dimensions config[self.CHECK_NAME] = {'init_config': None, 'instances': [parameters]} return config def dependencies_installed(self): """ Return True if dependencies are installed. """ return True 07070100000048000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/rings07070100000049000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/rings/__init__.py0707010000004A000081A4000003E800000064000000015BE06E0300003B1C000000000000000000000000000000000000004800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/rings/ardana_model.py# (c) Copyright 2015, 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from swiftlm.rings.ring_model import DeviceInfo, Consumes, SwiftModelException from swiftlm.utils.drivedata import DISK_MOUNT, LVM_MOUNT class ServersModel(object): """ Access the drive data in the input model The input model looks like: global: all_servers: - name: server1 server_group_list: - AZ1 - RACK1 network_names: - server1-ardana - server1-mgmt - server1-obj disk_model: device_groups: - consumer: name: swift attrs: rings: - object-0 # Allows raw name or - name: object-0 # name as key (future # extension devices: - /dev/sda - /dev/sdb - .etc... - consumer: name: # We ignore attrs: devices: The main output is from iter_devices(). This returns a list of all swift drives with following items: cloud The cloud the server is in (added for audit purposes -- the server_name is used in ring building) control_plane The control plane the server is in (also for audit purposes) region_id The swift region id (e.g., 1) zone_id The swift zone id (e.g., 2) server_name The ardana ansible name/host of the server network_names The names of of the server on each of the networks server_ip The IP address of the server server_bind_port: The port number to use (e.g., 6000) server_groups: List of server groups associated with the server containing drive replication_ip The IP address of the drive on the replication network. Or None if no replication network replication_bind_port Port to use if a replication network is used swift_drive_name Name used in ring files (e.g. swdisk1) device_name The name of the device (e.g., /dev/sdb) ring_name The ringname (e.g., object-1) group_type 'device' or 'volume' presence: Currently always 'present' (because in model it is either there or not. Pass-through is used to signal draining and removal.) """ def __init__(self, cloud, control_plane, config=None, consumes_model=None): ''' :param cloud: used in unit tests :param control_plane: used in unit tests :param config: used in unit tests :param consumes_model: used in unit tests :return: ''' self.servers = [] self.consumes = None if consumes_model: # Used by unit tests self.register_consumes(Consumes(consumes_model)) # Unit tests can load up a single site here; supervisor uses # add_servers(). if config: servers = [] if config.get('control_plane_servers'): servers = config.get('control_plane_servers') elif config.get('global'): if config.get('global').get('all_servers'): servers = config.get('global').get('all_servers') self.add_servers(cloud, control_plane, servers) def add_servers(self, cloud, control_plane, servers): for server in servers: server['cloud'] = cloud server['control_plane'] = control_plane self.servers.extend(servers) def register_consumes(self, consumes): self.consumes = consumes def get_num_devices(self, ring_name): num_devices = 0 for device in self.iter_devices(): if device.ring_name == ring_name: num_devices += 1 return num_devices def iter_devices(self): for device_info in self._iter_device_groups(): yield device_info for device_info in self._iter_volume_groups(): yield device_info def _iter_device_groups(self): for server in self.servers: server_name = server.get('ardana_ansible_host', server.get('name')) cloud = server.get('cloud') control_plane = server.get('control_plane') network_names = server.get('network_names') disk_model = server.get('disk_model') server_groups = server.get('server_group_list', []) device_index = 0 for device_group in disk_model.get('device_groups', []): consumer = device_group.get('consumer') if consumer and consumer.get('name', 'other') == 'swift': attrs = consumer.get('attrs') if not attrs: raise SwiftModelException('The attrs item is' ' missing from device-groups' ' %s in disk model %s' % (device_group.get('name'), disk_model.get('name'))) devices = device_group.get('devices') if not attrs.get('rings'): raise SwiftModelException('The rings item is' ' missing from device-groups' ' %s in disk model %s' % (device_group.get('name'), disk_model.get('name'))) for device in devices: for ring in attrs.get('rings'): if isinstance(ring, str): ring_name = ring else: ring_name = ring.get('name') server_ip, bind_port = self._get_server_bind( ring_name, network_names) if not server_ip: # When a swift service (example swift-account) # is configured in the input model to run a # node, we expect the node to be in the # "consumes" variable. e.g., consumes_SWF_ACC # should have this node in its list. Since we # failed to get the network name/port, it means # that it is not. # In model terms, we have a disk model that # calls out that a device hosts a ring (e.g. # account), but the node is not configured # to run SWF-ACC. # TODO: this may be worth warning break swift_drive_name = DISK_MOUNT + str(device_index) device_info = DeviceInfo({ 'cloud': cloud, 'control_plane': control_plane, 'server_groups': server_groups, 'region_id': 1, # later, the server group may 'zone_id': 1, # change these defaults 'server_name': server_name, 'network_names': network_names, 'server_ip': server_ip, 'server_bind_port': bind_port, 'replication_ip': server_ip, 'replication_bind_port': bind_port, 'swift_drive_name': swift_drive_name, 'device_name': device.get('name'), 'ring_name': ring_name, 'group_type': 'device', 'block_devices': {'percent': '100%', 'physicals': [device.get('name')]}, 'presence': 'present'}) yield device_info device_index += 1 def _iter_volume_groups(self): for server in self.servers: server_name = server.get('ardana_ansible_host', server.get('name')) cloud = server.get('cloud') control_plane = server.get('control_plane') network_names = server.get('network_names') disk_model = server.get('disk_model') server_groups = server.get('server_group_list', []) lv_index = 0 for volume_group in disk_model.get('volume_groups', []): vg_name = volume_group.get('name') physical_volumes = volume_group.get('physical_volumes') for logical_volume in volume_group.get('logical_volumes', []): lv_name = logical_volume.get('name') percent = logical_volume.get('size') consumer = logical_volume.get('consumer') if consumer and consumer.get('name', 'other') == 'swift': attrs = consumer.get('attrs') if not attrs: raise SwiftModelException('The attrs item is' ' missing from ' ' logical volume' ' %s in disk model %s' % (logical_volume.get( 'name'), disk_model.get('name'))) if not attrs.get('rings'): raise SwiftModelException('The rings item is' ' missing from logical' ' volume' ' %s in disk model %s' % (logical_volume.get( 'name'), disk_model.get('name'))) for ring in attrs.get('rings'): if isinstance(ring, str): ring_name = ring else: ring_name = ring.get('name') server_ip, bind_port = self._get_server_bind( ring_name, network_names) if not server_ip: # TODO: this may be worth warning break swift_drive_name = LVM_MOUNT + str(lv_index) device_name = '/dev/' + vg_name + '/' + lv_name device_info = DeviceInfo({ 'cloud': cloud, 'control_plane': control_plane, 'server_groups': server_groups, 'region_id': 1, # later, the server group may 'zone_id': 1, # change these defaults 'server_name': server_name, 'network_names': network_names, 'server_ip': server_ip, 'server_bind_port': bind_port, 'replication_ip': server_ip, 'replication_bind_port': bind_port, 'swift_drive_name': swift_drive_name, 'device_name': device_name, 'ring_name': ring_name, 'group_type': 'lvm', 'block_devices': {'percent': percent, 'physicals': physical_volumes}, 'presence': 'present'}) yield device_info lv_index += 1 def _get_server_pass_through(self, server_name): for server in self.servers: if server_name == server.get('ardana_ansible_host', server.get('name')): pass_through_data = server.get('pass_through', {}) if pass_through_data and isinstance(pass_through_data, dict): return pass_through_data.get('swift', {}) return {} def server_draining(self, server_name): if self._get_server_pass_through(server_name).get('drain'): return True return False def server_removing(self, server_name): if self._get_server_pass_through(server_name).get('remove'): return True return False def _get_server_bind(self, ring_name, network_names): network_name, network_ip_address, network_port = \ self.consumes.get_network_name_port(ring_name, network_names) if not network_name: return None, None return (network_ip_address, network_port) def __repr__(self): output = '\nInput Model\n' output += '-----------\n\n' output += '\n Servers\n' output += ' Number: %s' % len(self.servers) output += '\n Device Information\n' for di in self.iter_devices(): output += '\n device info: %s' % di return output 0707010000004B000081A4000003E800000064000000015BE06E03000048A4000000000000000000000000000000000000004800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/rings/ring_builder.py# (c) Copyright 2015-2017 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from os import listdir import os.path import subprocess import sys from datetime import timedelta import yaml import json from swift.common.ring import RingBuilder as SwiftRingBuilder from swiftlm.rings.ring_model import DeviceInfo, RingSpecification def human_size(bytes): units = ['Bytes', 'KB', 'MB', 'GB', 'PB', 'EB'] k = 1024.0 for unit in units: if bytes < k: return '%s%s' % ('{:.2f}'.format(bytes), unit) bytes = bytes / k return '%s%s' % ('{:.2f}'.format(bytes * k), unit) class RingDelta(object): """ The ring delta describes rings in actionable terms The ring delta contains the following: delta_rings Is all known ring specifications. Most ring specifications originate in the input model. However, we might find a builder file for a ring that has been since deleted from the input model. delta_ring_actions Describes the actions we will take against delta_rings (such as create or change replica-count). delta_devices Is all known devices. Most devices originate in the input model. However, we may find a device in a builder file (because the device or server has since been removed from the input model). In addition to listing the device attributes, we also record the action (add, remove, change weight) that should happen to the device. """ def __init__(self): self.delta_rings = {} self.delta_ring_actions = {} self.delta_devices = [] self.primary = True def read_from_file(self, fd, fmt): if fmt == 'yaml': model = yaml.safe_load(fd) else: model = json.loads(fd) self.load_model(model) fd.close() def write_to_file(self, fd, fmt): data = self.dump_model() if fmt == 'yaml': output = yaml.safe_dump(data, default_flow_style=False) else: output = yaml.dumps(data, indent=2) fd.write(output) if not fd == sys.stdout: fd.close() def __repr__(self): output = '' for ring_name in self.delta_rings.keys(): output += '-----------------------------\n' output += 'ring_name: %s\n ring_spec: %s' % ( ring_name, self.delta_rings[ring_name]) for ring_name in self.delta_ring_actions: output += '-----------------------------\n' output += '%s ring_name: %s\n action: %s' % ( ring_name, self.delta_ring_actions[ring_name]) for device in self.delta_devices: output += '-----------------------------\n' output += 'DEVICE\n' output += '%s\n' % device return output def dump_model(self): staged_rings = [] for ring_name in self.delta_rings.keys(): ring_specification = self.delta_rings[ring_name] staged_rings.append({'ring_name': ring_name, 'ring_specification': ring_specification.dump_model()}) stage_ring_actions = [] for ring_name in self.delta_ring_actions.keys(): action = self.delta_ring_actions[ring_name] stage_ring_actions.append({'ring_name': ring_name, 'action': action}) staged_devices = [] for device in self.delta_devices: staged_devices.append(device.dump_model()) return {'delta_rings': staged_rings, 'delta_ring_actions': stage_ring_actions, 'delta_devices': staged_devices} def load_model(self, data): staged_rings = data.get('delta_rings') for staged_ring in staged_rings: ring_name = staged_ring.get('ring_name') ring_specification = RingSpecification(None) ring_specification.load_model(staged_ring.get( 'ring_specification')) self.delta_rings[ring_name] = ring_specification stage_ring_actions = data.get('delta_ring_actions') for stage_ring_action in stage_ring_actions: ring_name = stage_ring_action.get('ring_name') action = stage_ring_action.get('action') self.delta_ring_actions[ring_name] = action for staged_device in data.get('delta_devices'): device = DeviceInfo() device.load_from_model(staged_device) self.delta_devices.append(device) def append_device(self, device_info): self.delta_devices.append(device_info) def register_primary(self, is_primary_state): self.primary = is_primary_state def register_ring(self, ring_name, ring_specification): self.delta_rings[ring_name] = ring_specification self.delta_ring_actions[ring_name] = ['undetermined'] def sort(self): self.delta_devices = sorted(self.delta_devices, None, DeviceInfo.sortkey) def get_report(self, options): output = '' if not self.primary: output += 'This is a secondary site - copy builder and ring\n' \ 'files from the primary site.' return output output += 'Rings:\n' for ring_name in self.delta_rings.keys(): if options.limit_ring and (options.limit_ring != ring_name): continue output += ' %s:\n' % ring_name.upper() if self.delta_ring_actions.get(ring_name) == ['add']: output += ' ring will be created\n' else: remaining = self.delta_rings[ring_name].get('remaining') output += ' ring exists (minimum time to next' \ ' rebalance: %s)\n' % remaining if (self.delta_ring_actions.get(ring_name) == ['set-replica-count']): output += ' replica-count will be changed\n' if (self.delta_ring_actions.get(ring_name) == ['set-min-part-hours']): output += ' min-part-hours will be changed\n' num_devices_to_add = 0 size_to_add = 0 num_devices_to_remove = 0 size_to_remove = 0 size_to_reweight = 0 num_devices_to_set_weight = 0 for device_info in self.delta_devices: if device_info.ring_name == ring_name: if device_info.presence == 'add': if options.detail == 'full': output += ' add: %s\n' % device_info.meta num_devices_to_add += 1 size_to_add += (float(device_info.target_weight) * options.size_to_weight) elif device_info.presence == 'remove': if options.detail == 'full': output += ' remove: %s\n' % device_info.meta num_devices_to_remove += 1 size_to_remove += (float(device_info.current_weight) * options.size_to_weight) elif device_info.presence == 'set-weight': if options.detail == 'full': size_to_reweight += (abs( float(device_info.target_weight) - float(device_info.current_weight)) * options.size_to_weight) output += ' set-weight %s %s > %s > %s\n' % ( device_info.meta, device_info.current_weight, device_info.target_weight, device_info.model_weight) num_devices_to_set_weight += 1 if num_devices_to_add: output += ' will add %s devices (%s)\n' % ( num_devices_to_add, human_size(size_to_add)) if num_devices_to_set_weight: output += ' will change weight on %s' \ ' devices (%s)\n' % (num_devices_to_set_weight, human_size(size_to_reweight)) if num_devices_to_remove: output += ' will remove %s' \ ' devices (%s)\n' % (num_devices_to_remove, human_size(size_to_remove)) if not (num_devices_to_add or num_devices_to_set_weight or num_devices_to_remove): output += ' no device changes\n' output += ' ring will be rebalanced\n' return output class RingBuilder(object): def __init__(self, builder_dir=None, read_rings=False): """ Read and issue ring update commands This class handles access to the builder files of a set of rings. :param builder_dir: The directory where the builder files are located :param read_rings: If True, read from existing builder files """ self.builder_dir = builder_dir self.flat_device_list = [] self.builder_rings = {} if builder_dir and not os.path.isdir(builder_dir): raise IOError('%s is not a directory' % builder_dir) if read_rings: for filename in [f for f in listdir(builder_dir) if (os.path.isfile(os.path.join(builder_dir, f)) and f.endswith('.builder'))]: ring_name = filename[0:filename.find('.builder')] self.replica_count = 0.0 self.balance = 0.0 for device_info in self._get_devs_from_builder(os.path.join( builder_dir, filename)): device_info.ring_name = ring_name self.flat_device_list.append(device_info) self.register_ring(ring_name, self.replica_count, self.balance, self.dispersion, self.min_part_hours, self.remaining, self.overload) def __repr__(self): output = ' RING FILES\n' for ring_name in self.builder_rings.keys(): output += ' ring: %s' % ring_name output += ' ringspec: %s\n' % self.builder_rings[ring_name] output += ' FLAT DEVICES\n' for drive_detail in self.flat_device_list: output += ' %s\n' % drive_detail return output def register_ring(self, ring_name, replica_count, balance=0.0, dispersion=0.0, min_part_hours=24, remaining='unknown', overload=0.0): if not self.builder_rings.get(ring_name): model = {'name': ring_name, 'partition_power': 0, 'replication_policy': {'replica_count': replica_count}, 'display_name': 'unknown', 'balance': balance, 'dispersion': dispersion, 'min_part_hours': min_part_hours, 'remaining': remaining, 'overload': overload} ringspec = RingSpecification(None) ringspec.load_model(model) self.builder_rings[ring_name] = ringspec def get_ringspec(self, ring_name): return self.builder_rings[ring_name] def _get_devs_from_builder(self, builder_filename): ring_name = builder_filename[0:builder_filename.find('.builder')] try: builder = SwiftRingBuilder.load(builder_filename) except Exception as e: raise IOError('ERROR: swift-ring-builder Problem occurred while ' 'reading builder file: %s. %s' % (builder_filename, e)) self.partitions = builder.parts self.replica_count = builder.replicas balance = 0 if builder.devs: balance = builder.get_balance() self.balance = balance self.dispersion = 0.00 if builder.dispersion: self.dispersion = float(builder.dispersion) self.min_part_hours = builder.min_part_hours self.remaining = str(timedelta(seconds=builder.min_part_seconds_left)) self.overload = builder.overload if builder.devs: balance_per_dev = builder._build_balance_per_dev() for dev in builder._iter_devs(): if dev in builder._remove_devs: # This device was added and then later removed without # doing a rebalance in the meantime. We can ignore # since it's marked for deletion. continue device_info = DeviceInfo( { 'ring_name': ring_name, 'zone_id': dev['zone'], 'region_id': dev['region'], 'server_ip': dev['ip'], 'server_bind_port': dev['port'], 'replication_ip': dev['replication_ip'], 'replication_bind_port': dev['replication_port'], 'swift_drive_name': dev['device'], 'current_weight': dev['weight'], 'balance': balance_per_dev[dev['id']], 'meta': dev['meta'], 'presence': 'present' }) yield device_info def device_count(self, ring_name): count = 0 for device_info in self.flat_device_list: if ring_name == device_info.ring_name: count += 1 return count def command_ring_create(self, ringspec): ring_name = ringspec.name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) return 'swift-ring-builder %s create %s %s %s' % ( builder_path, ringspec.partition_power, ringspec.replica_count, ringspec.min_part_hours) def command_set_replica_count(self, ringspec): ring_name = ringspec.name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) return 'swift-ring-builder %s set_replicas %s' % ( builder_path, ringspec.replica_count) def command_set_min_part_hours(self, ringspec): ring_name = ringspec.name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) return 'swift-ring-builder %s set_min_part_hours %s' % ( builder_path, ringspec.min_part_hours) def command_device_add(self, device_info): ring_name = device_info.ring_name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) if not device_info.replication_bind_port: device_info.replication_bind_port = device_info.server_bind_port if not device_info.replication_ip: device_info.replication_ip = device_info.server_ip return ('swift-ring-builder %s add' ' --region %s --zone %s' ' --ip %s --port %s' ' --replication-port %s --replication-ip %s' ' --device %s --meta %s' ' --weight %s' % (builder_path, device_info.region_id, device_info.zone_id, device_info.server_ip, device_info.server_bind_port, device_info.replication_bind_port, device_info.replication_ip, device_info.swift_drive_name, device_info.meta, device_info.target_weight)) def command_pretend_min_part_hours_passed(self, ringspec): ring_name = ringspec.name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) return ('swift-ring-builder %s pretend_min_part_hours_passed' % builder_path) def command_rebalance(self, ringspec): ring_name = ringspec.name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) return ('swift-ring-builder %s rebalance 999' % builder_path) def command_device_set_weight(self, device_info): ring_name = device_info.ring_name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) ipaddr = device_info.server_ip swift_drive_name = device_info.swift_drive_name search = '%s/%s' % (ipaddr, swift_drive_name) return('swift-ring-builder %s set_weight %s %s' % (builder_path, search, device_info.target_weight)) def command_device_remove(self, device_info): ring_name = device_info.ring_name builder_path = os.path.join(self.builder_dir, '%s.builder' % ring_name) ipaddr = device_info.server_ip swift_drive_name = device_info.swift_drive_name search = '%s/%s' % (ipaddr, swift_drive_name) return('swift-ring-builder %s remove %s' % (builder_path, search)) @staticmethod def run_cmd(cmd): status = 0 try: output = subprocess.check_output(cmd.split()) except subprocess.CalledProcessError as err: status = err.returncode output = err.output if int(status) <= 1: # Exited with EXIT_WARNING status = -1 return (int(status), output) 0707010000004C000081A4000003E800000064000000015BE06E0300007E8F000000000000000000000000000000000000004600000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/rings/ring_model.py# (c) Copyright 2015, 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import os from yaml import safe_load class SwiftModelException(Exception): def __init__(self, value): self.value = value def __str__(self): return repr(self.value) class RingSpecification(dict): """ Specification of a single ring This input has the following structure: name: # Ring name display_name: weight-step: # Optional. (default None) partition_power: min_part_hours: # Old name is min_part_time remaining: # For rings read from a builder # file, this is set; otherwise # it is not part of input model default: # Optional (default: False) server_bind_port: # Reserved replication_bind_port: # Reserved replication_policy: # Must be present for account/ replica_count: # container rings erasure_coding_policy; # Optional for object rings ec_num_data_fragments: ec_num_parity_fragments: ec_type: jerasure_rs_vand # Optional ec_object_segment_size: swift_zones: # Optional. id: # in the input model server_groups: - AZ1 - OTHER balance: # Not in input model, but is in # rings read from a builder file parent: # Not in model. A pointer to # containing specification # (to inherit region and zone) """ keynames = ['name', 'display_name', 'partition_power', 'min_part_hours', 'remaining', 'default', 'server_bind_port', 'replication_bind_port', 'replication_policy', 'erasure_coding_policy', 'swift_zones', 'balance', 'parent', 'weight_step'] def __init__(self, parent): super(RingSpecification, self).__init__() self.update({'parent': parent}) def __getattr__(self, item): # Special cases if item == 'replica_count': if self.get('replication_policy'): return float( self.get('replication_policy').get('replica_count')) elif self.get('erasure_coding_policy'): ec = self.get('erasure_coding_policy') return float((ec.get('ec_num_data_fragments') + ec.get('ec_num_parity_fragments'))) return None elif item == 'min_part_hours': # Elsewhere we may disallow a default return self.get('min_part_hours', self.get('min_part_time', 48)) elif item == 'weight_step': if self.get('weight_step'): return float(self.get('weight_step')) else: return None # Return value for valid items if item in RingSpecification.keynames: return self.get(item, None) else: raise AttributeError def __setattr__(self, item, value): # Special cases if item == 'replica_count': if self.get('replication_policy'): self.get('replication_policy')['replica_count'] = float(value) return elif self.get('erasure_coding_policy'): raise SwiftModelException('Cannot set replica-count' ' directly on an EC ring') # Set value for valid items if item in RingSpecification.keynames: self[item] = value else: raise AttributeError def __repr__(self): output = '(ring) name: %s,' % self.name output += ' display-name: %s,' % self.display_name output += ' partition-power: %s,' % self.partition_power output += ' replica_count: %s' % self.__getattr__('replica_count') return output def dump_model(self): model = {} for key in self.keys(): if key not in ['parent']: model[key] = self.get(key) return model def load_model(self, model): self.update(model) if not self.get('server_bind_port', None): if self.get('name').startswith('account'): port = 6002 elif self.get('name').startswith('container'): port = 6001 else: port = 6000 self['server_bind_port'] = port self['replication_bind_port'] = self.server_bind_port self.validate() def validate(self): if self.get('min_part_hours') and self.get('min_part_time'): raise SwiftModelException('Ring: %s has specified both' ' min-part-time and' ' min-part-hours. Please use' ' min-part-hours.' % self.name) if not (self.get('min_part_hours') or self.get('min_part_time')): raise SwiftModelException('Ring: %s is missing' ' min-part-hours or has a' ' value of zero.' % self.name) if self.replication_policy and self.erasure_coding_policy: raise SwiftModelException('Ring: %s has specified both' ' replication_policy and' ' erasure_coding_policy. Only one' ' may be specified.' % self.name) if not (self.replication_policy or self.erasure_coding_policy): raise SwiftModelException('Ring: %s is missing a policy' ' type (replication_policy or' ' erasure_coding_policy).' % self.name) if self.swift_zones: groups = [] for zone in self.swift_zones: zone_id = zone.get('id') if zone_id is None: raise SwiftModelException('Ring: %s is missing id field' ' in a swift-zones' ' entry' % self.name) try: _ = int(zone_id) except ValueError: raise SwiftModelException('Ring: %s has invalid' ' id value in' ' in a swift-zones' ' entry' % self.name) groups.extend(zone.get('server_groups', [])) if groups: if len(groups) != len(set(groups)): raise SwiftModelException('Ring: %s has duplicate' ' server-group name' ' in a swift-zones' ' entry' % self.name) def get_zone(self, server_groups): """ Get zone id for given server group :param server_groups: Server groups to search for :returns: -1 zones not specified. None means server_group not found """ if self.swift_zones: for zone in self.swift_zones: zone_id = zone.get('id') for group in server_groups: if group in zone.get('server_groups', []): return zone_id return None else: return -1 class ControlPlaneRings(object): """ Rings in a given control plane This class represents the input model's ring specfications for a control plane. This is specified in a configuration-data object. For multi-site, the primary control plane should have the primary-control-plane attribute set to true and the other sites should set to false. The input model looks like: primary_control_plane: # optional - default to True rings: - # see that class for details - second ring, etc swift-regions: - id: server-gropups: - - other group - etc swift-regions: - id: server-groups: - - other group - etc """ def __init__(self, parent): self.parent = parent self.region_name = '' self.swift_regions = [] self.swift_zones = [] self.rings = [] def __repr__(self): output = 'region-name: %s\n' % self.region_name output += 'swift-regions: %s\n' % self.swift_regions output += 'swift-zones: %s\n' % self.swift_zones output += '----rings----\n' for ringspec in self.rings: output += '\n%s' % ringspec output += '----end ring----\n' return output def load_model(self, model): self.region_name = model.get('region_name') self.primary_control_plane = model.get('primary_control_plane', True) if not self.primary_control_plane: return # only attributes of primary site are used for ring in model.get('rings'): ringspec = RingSpecification(self) ringspec.load_model(ring) self.rings.append(ringspec) if model.get('swift_regions'): self.swift_regions = model.get('swift_regions') if model.get('swift_zones'): self.swift_zones = model.get('swift_zones') self.validate() def validate(self): if self.swift_regions: groups = [] for region in self.swift_regions: region_id = region.get('id') if region_id is None: raise SwiftModelException('Rings in region %s is missing' ' id field' ' in a swift-regions' ' entry' % self.region_name) try: _ = int(region_id) except ValueError: raise SwiftModelException('Rings in region %s has invalid' ' id value in' ' in a swift-regions' ' entry' % self.region_name) groups.extend(region.get('server_groups', [])) if groups: if len(groups) != len(set(groups)): raise SwiftModelException('Rings in region %s has ' ' duplicate' ' server-group name' ' in a swift-regions' ' entry' % self.region_name) if self.swift_zones: groups = [] for zone in self.swift_zones: zone_id = zone.get('id') if zone_id is None: raise SwiftModelException('Rings in region %s is missing' ' id field in a swift-zones' ' entry' % self.region_name) try: _ = int(zone_id) except ValueError: raise SwiftModelException('Rings in region %s has invalid' ' id value in' ' in a swift-zones' ' entry' % self.region_name) groups.extend(zone.get('server_groups', [])) if groups: if len(groups) != len(set(groups)): raise SwiftModelException('Rings in region %s has ' ' duplicate' ' server-group name' ' in a swift-zones' ' entry' % self.region_name) def get_ringspec(self, ring_name): for ringspec in self.rings: if ring_name == ringspec.name: return ringspec return None def is_primary_control_plane(self): return self.primary_control_plane def get_region(self, server_groups): if self.swift_regions: for region in self.swift_regions: region_id = region.get('id') for group in server_groups: if group in region.get('server_groups', []): return region_id return None else: return -1 def get_zone(self, server_groups): """ Get zone at control plane level """ if self.swift_zones: for zone in self.swift_zones: zone_id = zone.get('id') for group in server_groups: if group in zone.get('server_groups', []): return zone_id return None else: return -1 def get_region_zone(self, ring_name, server_groups): """ Get region/zone; -1 means not defined """ swift_region_id = self.get_region(server_groups) # See if zone defined at ring level swift_zone_id = -1 for ringspec in self.rings: if ring_name == ringspec.name: swift_zone_id = ringspec.get_zone(server_groups) if swift_zone_id > 0: return (swift_region_id, swift_zone_id) if swift_zone_id == -1: # Not defined at ring level -- get from control plane (this) level swift_zone_id = self.get_zone(server_groups) return swift_region_id, swift_zone_id class RingSpecifications(object): """ Specification of rings from multiple sites When there are multiple sites, one of the sites contains the definitive primary ring specifications. The other site(s) have a specification that they import the primary rings. The rings are specified via a configuration-data object in the inout model. For Swift, this looks like: config_data: control_plane_rings: # See that class for details : The model looks like: global: all_ring_specifications: - # See that class for details - a second region was allowed here; never used; now deprecated """ def __init__(self, cloud, control_plane, fd=None, model=None, configuration_data=None): self.control_planes = {} if fd: self.load_model(safe_load(fd)) if model: # Used by unit tests self.load_model(cloud, control_plane, model) if configuration_data: self.load_configuration(cloud, control_plane, configuration_data) def __repr__(self): output = 'specs from all clouds:' for cl, cp in self.control_planes.keys(): output += '\n------\ncloud: %s control-plane:%s:' % (cl, cp) output += '\n%s' % self.control_planes[(cl, cp)] return output def load_configuration(self, cloud, control_plane, config_data): config_data = dash_to_underscore(config_data) if config_data.get('control_plane_rings'): control_plane_rings = ControlPlaneRings(self) control_plane_rings.load_model( config_data.get('control_plane_rings')) self.control_planes[(cloud, control_plane)] = control_plane_rings else: # FIXME: fail if there is no specification (when CP translates) pass def load_model(self, cloud, control_plane, model): # Used by unit tests ringspecs = model.get('global').get('all_ring_specifications') if not ringspecs: return for ksregion in ringspecs: keystone_region_rings = ControlPlaneRings(self) keystone_region_rings.load_model(ksregion) self.control_planes[(cloud, control_plane)] = keystone_region_rings def get_control_plane_rings(self, cloud, control_plane): return self.control_planes.get((cloud, control_plane), None) class DeviceInfo(dict): """ Represents all the data connected with a device This information is used in two contexts: - In the ring delta, it is sourced from the input model and specifies changes (if any) to make to the rings - Is created by reading an existing ring builder file Some items are present or absent depending on context as follows: Item Input Builder Description Model File ---- ----- ------- --------------------------------------- region_name yes yes Region name server_groups yes no The groups the server is a member of region_id yes yes Region number zone_id yes yes Zone number network_names yes no Alternate names of the server server_name yes no Server name (ardana anisible name) server_ip yes yes IP address of server server_bind_port yes yes Bind port for this device replication_ip yes yes Replication IP address replication_bind_ip yes yes Replication bind port of this device swift_drive_name yes yes Device/mount point name (e.g., disk3) device_name yes no Block device name (e.g., /dev/sdd) ring_name yes yes Ring name group_type yes no Block device or LVM device presence yes yes Action (if any) to take current_weight yes yes Current weight of device target_weight yes no Planned weight when added or changed model_weight yes no The final weight based on model balance no yes Current balance of the device meta yes yes Meta item block_devices yes no For LVM, the underlying block devices """ keynames = ['region_name', 'server_groups', 'region_id', 'zone_id', 'network_names', 'server_name', 'server_ip', 'server_bind_port', 'replication_ip', 'replication_bind_port', 'swift_drive_name', 'device_name', 'ring_name', 'group_type', 'presence', 'current_weight', 'target_weight', 'model_weight', 'balance', 'meta', 'block_devices'] def __init__(self, model=None): super(DeviceInfo, self).__init__() if model: self.load_from_model(model) def __getattr__(self, item): if item in ['zone_id', 'region_id']: return str(self.get(item, None)) if item in ['network_names', 'server_groups']: return self.get(item, []) if item in DeviceInfo.keynames: return self.get(item, None) else: raise AttributeError('No key %s in %s' % (item, self)) def __setattr__(self, item, value): if item in DeviceInfo.keynames: self.update({item: value}) else: raise AttributeError('No key %s in %s' % (item, self)) def is_same_device(self, device_info): if (self.region_name == device_info.region_name and self.ring_name == device_info.ring_name and self.server_ip == device_info.server_ip and self.swift_drive_name == device_info.swift_drive_name): return True return False def is_bad_change(self, device_info): if self.zone_id != device_info.zone_id: return 'swift-zone id from %s to %s' % (device_info.zone_id, self.zone_id) if self.region_id != device_info.region_id: return 'swift-region id from %s to %s' % (device_info.region_id, self.region_id) return None def dump_model(self): return self.copy() def load_from_model(self, model): self.update(model) if not self.region_name: self['region_name'] = 'unknown' if not self.group_type: self['group_type'] = 'device' if not self.presence: self['presence'] = 'present' if not self.balance: self['balance'] = 0 if not self.meta and (self.swift_drive_name and self.device_name): self['meta'] = '%s:%s:%s' % (self.server_name, self.swift_drive_name, self.device_name) @staticmethod def sortkey(device_info): return (device_info.region_name + device_info.ring_name + device_info.server_ip + device_info.swift_drive_name) class DriveConfigurations(object): """ Represents all disk configuration of all known servers This contains the drive configuration for all servers. It provides a convenient way of getting the size of any given drive (/dev/sda) or partition (/dev/sda1). """ def __init__(self): self.configurations = {} self.drive_data = {} def add(self, configuration): hostname = configuration.get('hostname') self.configurations[hostname] = configuration for name, size, fulldrive in configuration.iter_drive_info(): self.drive_data[(hostname, name)] = (size, fulldrive) def get_drive_configuration(self, hostname): return self.configurations.get(hostname) def iter_drives(self): for hostname in self.configurations.keys(): drive_configuration = self.configurations.get(hostname) for (name, size, fulldrive) in drive_configuration.iter_drive_info(): yield (hostname, name, size, fulldrive) def get_drive_data(self, hostname, name): return self.drive_data.get((hostname, name), (None, None)) def get_hw(self, hostname, device_info): """ Get actual size of a swift device. Also whether its a partition or not The device_info may refer to a block device or a volume in a LVM. If a block device, it may refer to a specific partition. LVM is made up up several physical drives and the space is then allocated on a % basis to each volume. We need the volume size. The fulldrive flag indicates is the swift drive uses all of the physical drive or is just one of many partitions on the drive. An LVM volume is considered a full drive. :param hostname: server name (used to find the drive configuration) :param device_info: description of the drive :returns: tuple of size, fulldrive """ hw_size = None if device_info.group_type == 'device': physical = device_info.block_devices.get('physicals')[0] hw_size, hw_fulldrive = self.get_drive_data(hostname, physical) else: # LVM -- work out size from physical drives and % size physical_volumes_total = 0 for physical in device_info.block_devices.get('physicals'): if physical.endswith('_root'): # Templated device name - convert /dev/sdX_root to /dev/sdX # This makes the LVM appear slightly bigger than it # actually is since the boot partition gets counted in the # size. physical = physical[:-len('_root')] block_device_size, hw_fulldrive = self.get_drive_data( hostname, physical) if block_device_size: physical_volumes_total += block_device_size if physical_volumes_total: percent_human = device_info.block_devices.get( 'percent') try: # should be something such as 20% percent = int(percent_human.split('%')[0]) hw_size = physical_volumes_total * percent / 100 except ValueError: hw_size = None hw_fulldrive = True return hw_size, hw_fulldrive class DriveConfiguration(dict): """ Represents the disk drive configuration of a server The data originates in osconfig probe_hardware module. Normally the input model contains device names (e.g., /dev/sda). The swiftlm drive provision process will own the fill drive, hence it will create a single partition spanning the drive. Hence osconfig probe_hardware will return both the drive (/dev/sda) and the partition (/dev/sda1). For our purposes we use /dev/sda whether or not it has been partitioned. Alternatively (mostly to cope with non-standard layouts), we will allow partition names in the model (e.g., /dev/sda1). To cope with this we return the drive and all partitions, but mark them as not being a full drive. The hostname key here is my_ardana_ansible_name which should match the server_name based on ardana_ansible_host in the control_plane_servers The input data looks like (yaml): ipaddr: null hostname: blah1 drives: - name: /dev/sda bytes: 100000000 partitions: - partition: sda1 bytes: 100000000 - name: /dev/sdb bytes: 200000000 partitions: [] - name: /dev/sbc bytes: 300000 partitions: - partition: sdc1 bytes: 100000 partition: sdc2 bytes: 200000 """ keynames = ['ipaddr', 'hostname', 'drives'] def __init__(self): super(DriveConfiguration, self).__init__() def load_model(self, model): self.update(model) def iter_drive_info(self): """ Get list of drives :return: a list where each item is a tuple containing: name of drive (e.g. /dev/sda size (in bytes) boolean, where True means device is the full drive; False means it is a partition on the drive """ for drive in self.get('drives'): name = drive.get('name') size = drive.get('bytes') partitions = drive.get('partitions') if len(partitions) == 0: yield (name, size, True) elif len(partitions) == 1: # Single partition (e.g., sda1) using full drive # Return drive name (e.g., /dev/sda) covers both yield (name, size, True) else: # Drive is partitioned into smaller chunks # Return drive and all partitions, e.g., # /dev/sda, /dev/sda1, /dev/sda2 yield (name, size, False) for partition in partitions: yield (os.path.join('/dev', partition.get('partition')), partition.get('bytes'), False) class Consumes(object): """ Manages the SWF_RNG variable produced by the Configuration Processor. The idea is that the swift-ring-builder/SWF-RNG is definded to consume SWF-ACC, SWF-COn and SWF-OBJ. This means the Configuration Processor will create the appropriate consumes relationships for those services. In other words, the CP will tell us whether to use blah-mgt or blah-obj as the appropriate network to communicate on. The input model looks like: consumes_SWF_ACC: members: private: - host: standard-ccp-c1-m3-obj ip_address: 192.168.245.9 port: 6002 use_tls: false - host: standard-ccp-c1-m2-obj ip_address: 192.168.245.10 port: 6002 use_tls: false - host: standard-ccp-c1-m1-obj ip_address: 192.168.245.11 port: 6002 use_tls: false consumes_SWF_CON: etc... consumes_SWF_OBJ: etc... The main function used is get_network_name_port(). For example, given an input model as shown above, a ringname/ringtype of 'account' and a list of 'standard-ccp-c1-m2-ardana', 'standard-ccp-c1-m2-mgmt' and 'standard-ccp-c1-m2-obj', the function returns,, ('standard-ccp-c1-m2-obj', '192.168.245.10', 6002) ...since the appropriate network for the account-server is the '-obj' network. """ @classmethod def get_node_list(cls, item, model): node_list = model[item]['members']['private'] return node_list def __init__(self, model=None): self.nodes = {} self.nodes['account'] = [] self.nodes['container'] = [] self.nodes['object'] = [] self.host_to_network = {} if model: # Used by unit tests self.load_model(model) def load_model(self, model): self.nodes['account'].extend(Consumes.get_node_list( 'consumes_SWF_ACC', model)) self.nodes['container'].extend(Consumes.get_node_list( 'consumes_SWF_CON', model)) self.nodes['object'].extend(Consumes.get_node_list( 'consumes_SWF_OBJ', model)) for ringtype in ['account', 'container', 'object']: for node in self.nodes[ringtype]: network_name = node['host'] network_ip_address = node['ip_address'] network_port = node['port'] self.host_to_network[(ringtype, network_name)] = ( network_name, network_ip_address, network_port) def get_network_name_port(self, ringtype, host_network_names): """ :param ringtype: Tupe if ring (account, etc.) Can be ring name (e.g., object-0) :param host_network_names: list of network names for the host (e.g., blah-mgmt, blah-obj, blah-ardana) :return: tuple of name, ip-address, port """ if ringtype.startswith('object'): ringtype = 'object' for hostname in host_network_names: result = self.host_to_network.get((ringtype, hostname)) if result: return result return (None, None, None) def dash_to_underscore(obj): """ Convert dashes to underscores in an input model object :param obj: The object to convert :return:The converted object """ if isinstance(obj, list): ret_obj = [] for item in obj: ret_obj.append(dash_to_underscore(item)) elif isinstance(obj, dict): ret_obj = dict() for key in obj.keys(): new_key = key.replace('-', '_') ret_obj[new_key] = dash_to_underscore(obj[key]) else: ret_obj = obj return ret_obj 0707010000004D000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/swift0707010000004E000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/swift/__init__.py0707010000004F000081A4000003E800000064000000015BE06E0300000F8C000000000000000000000000000000000000004700000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/swift/drive_audit.py # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import ast import subprocess import os import ConfigParser from swiftlm.utils.metricdata import MetricData from swiftlm.utils.values import Severity ERRORS_PATTERN = 'drive-audit: Errors found:' DEVICES_PATTERN = 'drive-audit: Devices found:' DRIVE_AUDIT_CONF = '/etc/swift/drive-audit.conf' BASE_RESULT = MetricData( name=__name__, messages={ 'ok': 'No errors found on device mounted at: {mount_point}', 'warn': 'No devices found', 'fail': 'Errors found on device mounted at: {mount_point}', 'unknown': 'Unrecoverable error: {error}' } ) def get_devices(output): """ Returns a list of devices as a dict of mount_point and device """ # TODO use drive_model.yml to determine drives to check lines = [s.strip() for s in output.split('\n') if s] for line in lines: if DEVICES_PATTERN in line: devs = line.split(DEVICES_PATTERN)[1].strip() devices = ast.literal_eval(devs) return [{'mount_point': d['mount_point'], 'kernel_device': d['kernel_device'][:-1]} for d in devices] def get_error_devices(output): """ Returns a dict of mapping device->error count for each device with errors. """ lines = [s.strip() for s in output.split('\n') if s] for line in lines: if ERRORS_PATTERN in line: devices = line.split(ERRORS_PATTERN)[1].strip() err_devices = ast.literal_eval(devices) return err_devices def check_errors(drive_recon_file): try: # TODO check that '/etc/swift/drive-audit.conf' exists and # has log_to_console set process = subprocess.Popen( ['swift-drive-audit', DRIVE_AUDIT_CONF], stdout=subprocess.PIPE, stderr=subprocess.PIPE ) _, output = process.communicate() # Change the permissions of /drive.recon to 644 so that # swift recon may read it. os.chmod(drive_recon_file, 0o644) except (OSError, IOError) as e: result = BASE_RESULT.child(dimensions={'error': str(e)}) result.value = Severity.unknown return result found_devs = get_devices(output) if not found_devs: result = BASE_RESULT.child() # TODO maybe bump this up to a fail result.value = Severity.warn return result results = [] error_devices = get_error_devices(output) for dev in found_devs: dimensions = dict(dev) del dimensions['kernel_device'] result = BASE_RESULT.child(dimensions=dimensions) if dev.get('kernel_device') in error_devices.keys(): # TODO might like to include error count in value_meta result.value = Severity.fail else: result.value = Severity.ok results.append(result) return results def main(): parser = ConfigParser.RawConfigParser() parser.read(DRIVE_AUDIT_CONF) try: drive_recon_dir = parser.get("drive-audit", "recon_cache_path") except (ConfigParser.NoOptionError, ConfigParser.NoSectionError): drive_recon_dir = "/var/cache/swift" drive_recon_file = os.path.join(drive_recon_dir, "drive.recon") """Checks for corrupted sectors on drives.""" return check_errors(drive_recon_file) 07070100000050000081A4000003E800000064000000015BE06E030000147E000000000000000000000000000000000000004A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/swift/file_ownership.py # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import os import os.path import pwd from swiftlm.utils.utility import server_type from swiftlm.utils.metricdata import MetricData from swiftlm.utils.values import Severity, ServerType from swiftlm.utils.utility import get_swift_mount_point SWIFT_DIR = '/etc/swift' CONF_DIR = '/etc' NODE_DIR = get_swift_mount_point() ZERO_BYTE_EXCLUDE = frozenset(['reload-trigger', 'swauth_to_tenant_map.gz']) SWIFT_OWNED_EXCLUDE = frozenset(['lost+found']) def add_result(results, path, reason): messages = { 'empty': 'Path: {path} should not be empty', 'ownership': 'Path: {path} is not owned by swift', 'missing': 'Path: {path} is missing'} message = messages.get(reason).format(path=path) c = {'path': path, 'message': message} if c not in results: results.append(c) def _is_swift_owned(results, p): # True = good, False = bad owner = pwd.getpwuid(os.stat(p).st_uid).pw_name if owner == 'swift': return True else: if os.path.basename(p) not in SWIFT_OWNED_EXCLUDE: add_result(results, p, 'ownership') return False def not_swift_owned_config(results): # Check /etc/swift and its children p = SWIFT_DIR if os.path.isdir(p): for root, dirs, files in os.walk(p, followlinks=True): for d in dirs: x = os.path.join(root, d) _is_swift_owned(results, x) for f in files: x = os.path.join(root, f) _is_swift_owned(results, x) else: add_result(results, p, 'missing') return results def not_swift_owned_data(results): # Check all disk directories in /srv/node/ p = NODE_DIR if os.path.isdir(p): # We need to use topdown otherwise the directory tree is generated # first, This would be unacceptably slow with large numbers or objects for root, dirs, _ in os.walk(p, topdown=True): for d in dirs: x = os.path.join(root, d) _is_swift_owned(results, x) break # We only want to check immediate child directories else: if server_type(ServerType.object): # We only care that this directory is missing on object servers. add_result(results, p, 'missing') return results def _is_empty_file(results, p): # True = bad, False = good # Should think of a way to make false = bad here to match _is_swift_owned if not os.path.isfile(p): add_result(results, p, 'missing') return True if (os.stat(p).st_size == 0 and os.path.basename(p) not in ZERO_BYTE_EXCLUDE): add_result(results, p, 'empty') return True return False def empty_files(results): # Check individual files if not server_type(ServerType.proxy): _is_empty_file(results, CONF_DIR + '/rsyncd.conf') _is_empty_file(results, CONF_DIR + '/rsyslog.conf') # Check all children in /etc/swift p = SWIFT_DIR if os.path.isdir(p): for root, _, files in os.walk(p, followlinks=True): for f in files: x = os.path.join(root, f) _is_empty_file(results, x) else: add_result(results, p, 'missing') return results def main(): """Check that swift owns its relevant files and directories.""" # Check /etc/swift config_results = [] not_swift_owned_config(config_results) empty_files(config_results) # Check files under /srv/node data_results = [] not_swift_owned_data(data_results) # Generate metrics. Use the "reason" field from the *first* failure # in each category to populate the msg field for Severity.fail. If there # are several failures, the user will have to resolve them one by one. metrics = [] if config_results: metrics.append(MetricData.single(__name__ + '.config', Severity.fail, message='{message}', msgkeys=config_results[0])) else: metrics.append(MetricData.single(__name__ + '.config', Severity.ok, message='OK')) if data_results: metrics.append(MetricData.single(__name__ + '.data', Severity.fail, message='{message}', msgkeys=data_results[0])) else: metrics.append(MetricData.single(__name__ + '.data', Severity.ok, message='OK')) return metrics if __name__ == "__main__": main() 07070100000051000081A4000003E800000064000000015BE06E03000009FF000000000000000000000000000000000000004700000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/swift/replication.py # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import json from swiftlm.utils.metricdata import MetricData, timestamp, CheckFailure from swiftlm.utils.values import Severity, ServerType from swiftlm.utils.utility import SwiftlmCheckFailure RECON_PATH = '/var/cache/swift/' TIMEOUT = 2 BASE_RESULT = MetricData( name=__name__, messages={} ) def _recon_check(st): """ Parses the blah.recon file and returns the last replication. :param st: ServerType, Used to determine the metric names and recon file name. :param replication_field_name: string, name of the field in the json file that hold the last replication data. """ results = [] if not st.is_instance: return results r = BASE_RESULT.child(name=st.name + '.last_replication', dimensions={'component': '%s-replicator' % st.name}) recon_file = st.name + '.recon' try: with open(RECON_PATH + recon_file) as f: j = json.load(f) last_replication = j.get('replication_last') if last_replication is None: last_replication = j.get('object_replication_last') last_replication = int(last_replication) last_replication = timestamp() - last_replication except (ValueError, IOError) as e: raise SwiftlmCheckFailure('Error in %s: %s' % (RECON_PATH + recon_file, e)) r.value = last_replication results.append(r) return results def object_recon_check(): return _recon_check(ServerType.object) def container_recon_check(): return _recon_check(ServerType.container) def account_recon_check(): return _recon_check(ServerType.account) def main(): """Checks replication and health status.""" results = [] results.extend(object_recon_check()) results.extend(container_recon_check()) results.extend(account_recon_check()) return results 07070100000052000081A4000003E800000064000000015BE06E030000147D000000000000000000000000000000000000004A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/swift/swift_services.py # (c) Copyright 2015-2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import os from swiftlm.utils.utility import \ server_type, get_all_proc_and_cmdlines,\ get_network_interface_conf, get_rsync_target_conf from swiftlm.utils.metricdata import MetricData, get_base_dimensions from swiftlm.utils.values import Severity BASE_RESULT = MetricData( name=__name__, messages={ 'fail': '{component} is not running', 'ok': '{component} is running', 'unknown': 'no swift services running', } ) SERVICES = [ "account-auditor", "account-reaper", "account-replicator", "account-server", "container-replicator", "container-server", "container-updater", "container-auditor", "container-reconciler", "container-sync", "object-replicator", "object-server", "object-updater", "object-auditor", "object-reconstructor", "proxy-server" ] def services_to_check(): # Filter SERVICES down to what should be running on the node. # server_type returns a dict of {'object': bool, etc} prefix_server_type = tuple(k for k, v in server_type().items() if v) services = [s for s in SERVICES if s.startswith(prefix_server_type)] return services def check_swift_processes(): results = [] services = services_to_check() if not services: c = BASE_RESULT.child() c.value = Severity.unknown return c for service in services: c = BASE_RESULT.child(dimensions={'component': service}) if not is_service_running(service): c.value = Severity.fail else: c.value = Severity.ok results.append(c) return results def is_service_running(service): for _, cmdline in get_all_proc_and_cmdlines(): if len(cmdline) >= 3: if (cmdline[1].endswith("swift-" + service) and cmdline[2].endswith(".conf")): return True # Reach here if no matching process not found in /proc/cmdline return False def get_rsync_bind_ip(): rsync_running = False data = get_network_interface_conf() rsync_bind_ip_conf = data["rsync_bind_ip"] port = get_rsync_target_conf() rsync_bind_port_conf = port["rsync_bind_port"] for process, cmdline in get_all_proc_and_cmdlines(): if len(cmdline) >= 2: filename = None if os.path.exists(cmdline[0]): filename = os.path.basename(os.path.realpath(cmdline[0])) if filename == 'rsync' and cmdline[1] == '--daemon': rsync_running = True rsync_laddr = process.connections() rsync_laddr_ip, rsync_laddr_port = rsync_laddr[0].laddr continue if not rsync_running: return False, False ip_port_match = False if (rsync_bind_ip_conf == rsync_laddr_ip and rsync_bind_port_conf == str(rsync_laddr_port)): ip_port_match = True return rsync_running, ip_port_match def check_rsync(): metrics = [] rsync_running, ip_port_match = get_rsync_bind_ip() if not rsync_running: dimensions = get_base_dimensions() dimensions["component"] = "rsync" metrics.append(MetricData.single('swiftlm.swift.swift_services', Severity.fail, message='rsync is not running', dimensions=dimensions)) return metrics else: dimensions = get_base_dimensions() dimensions["component"] = "rsync" metrics.append(MetricData.single('swiftlm.swift.swift_services', Severity.ok, message='rsync is running', dimensions=dimensions)) if not ip_port_match: dimensions = get_base_dimensions() dimensions["component"] = "rsync" metrics.append(MetricData.single( 'swiftlm.swift.swift_services.check_ip_port', Severity.fail, message='rsync is not listening on the correct ip or port', dimensions=dimensions)) else: dimensions = get_base_dimensions() dimensions["component"] = "rsync" metrics.append(MetricData.single( 'swiftlm.swift.swift_services.check_ip_port', Severity.ok, message='OK', dimensions=dimensions)) return metrics def main(): """Check that the relevant services are running.""" metrics = [] metrics.extend(check_swift_processes()) metrics.extend(check_rsync()) return metrics 07070100000053000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/systems07070100000054000081A4000003E800000064000000015BE06E0300000000000000000000000000000000000000000000004600000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/systems/__init__.py07070100000055000081A4000003E800000064000000015BE06E0300001B14000000000000000000000000000000000000004A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/systems/check_mounts.py # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from __future__ import print_function import grp from math import ceil as ceil import os import pwd import json from collections import namedtuple from swiftlm.utils.values import Severity from swiftlm.utils.metricdata import MetricData from swiftlm.utils.utility import run_cmd from swiftlm.utils.utility import Aggregate, SwiftlmCheckFailure from swiftlm.utils.utility import get_swift_mount_point DEVICES = '/etc/ansible/facts.d/swift_drive_info.fact' LABEL_CHECK_DISABLED = '---NA---' Device = namedtuple('Device', ['device', 'mount', 'label']) MOUNT_PATH = get_swift_mount_point() def get_devices(): """ Parses ansible facts file in JSON format to discover drives. Required facts in the format are shown below. { ... "devices": [{ "name": "/dev/sd#", "swift_drive_name": "disk0", "label": "0000000d001", ... } ] ... } label is not currently in the file so we stub it out with NO_LABEL. """ try: with open(DEVICES, 'r') as f: data = json.load(f)['devices'] except (IOError, ValueError) as err: raise SwiftlmCheckFailure('Failure opening %s: %s' % (DEVICES, err)) devices = [] for d in data: devices.append(Device( device=d['name'], mount=MOUNT_PATH + d['swift_drive_name'], label=d.get('label', LABEL_CHECK_DISABLED) )) return devices def is_mounted(d, r): return os.path.ismount(d.mount) def is_mounted_775(d, r): # Take the last three digits of the octal repr of the permissions. perms = oct(os.stat(d.mount).st_mode)[-3:] if perms == '755': return True else: r.msgkey('permissions', perms) return False def is_ug_swift(d, r): """Checks mount point is owned by swift""" stats = os.stat(d.mount) uid = stats.st_uid gid = stats.st_gid user = pwd.getpwuid(uid).pw_name group = grp.getgrgid(gid).gr_name if user == group == 'swift': return True else: r.msgkey('user', user) r.msgkey('group', group) return False def is_valid_label(d, r): if d.label == LABEL_CHECK_DISABLED: return True rc = run_cmd('xfs_admin -l %s | grep -q %s' % (d.mount, d.label)) if rc.exitcode == 0: return True else: return False def is_xfs(d, r): rc = run_cmd('mount | grep -qE "%s.*xfs"' % d.mount) if rc.exitcode == 0: return True else: return False def is_valid_xfs(d, r): rc = run_cmd('xfs_info %s' % d.mount) if rc.exitcode == 0: return True else: return False BASE_RESULT = MetricData( name=__name__, messages={ is_mounted.__name__: '{device} not mounted at {mount}', is_mounted_775.__name__: ('{device} mounted at {mount} has permissions' ' {permissions} not 755'), is_ug_swift.__name__: ('{device} mounted at {mount} is not owned by' ' swift, has user: {user}, group: {group}'), is_valid_label.__name__: ('{device} mounted at {mount} has invalid ' 'label {label}'), is_xfs.__name__: '{device} mounted at {mount} is not XFS', is_valid_xfs.__name__: '{device} mounted at {mount} is corrupt', 'ok': '{device} mounted at {mount} ok' } ) DISKUSAGE_RESULT = MetricData(name='diskusage.host', messages={}) def check_mounts(): results = [] checks = ( is_mounted, is_mounted_775, is_ug_swift, is_valid_label, is_xfs, is_valid_xfs) devices = get_devices() if not devices: raise SwiftlmCheckFailure('No devices found to check. See %s' % DEVICES) for d in devices: result = BASE_RESULT.child(dimensions={'mount': d.mount}, msgkeys={'device': d.device, 'label': d.label}) for check in checks: if not check(d, result): result.message = check.__name__ result.value = Severity.fail break else: result.value = Severity.ok results.append(result) return results def get_diskusage(device): """ Get diskusage data :param device: :returns: dictionary containing key data """ try: path = os.path.join(device.mount) disk = os.statvfs(path) used = float(disk.f_blocks - disk.f_bfree) avail = float(disk.f_bavail) size = float(disk.f_blocks) usedpercent = float(ceil(100.0 * used / (used + avail))) sizebytes = int(size * disk.f_frsize) usedbytes = int(used * disk.f_frsize) availbytes = int(avail * disk.f_frsize) return {'size': sizebytes, 'used': usedbytes, 'avail': availbytes, 'usage': usedpercent} except IOError: return {} def diskusage(): results = [] usage_aggr = Aggregate() devices = get_devices() for d in devices: for key, value in get_diskusage(d).items(): result = DISKUSAGE_RESULT.child(name='val.' + key, dimensions={'mount': d.mount}, msgkeys={'device': d.device, 'label': d.label}) result.value = value results.append(result) if key == 'usage': usage_aggr.add(value) if usage_aggr.count: result = DISKUSAGE_RESULT.child(name='max.usage') result.value = usage_aggr.max results.append(result) result = DISKUSAGE_RESULT.child(name='min.usage') result.value = usage_aggr.min results.append(result) result = DISKUSAGE_RESULT.child(name='avg.usage') result.value = usage_aggr.avg results.append(result) return results def main(): """Checks the relevant swift mount points and diskusage""" results = [] results.extend(check_mounts()) results.extend(diskusage()) return results if __name__ == "__main__": print(main()) 07070100000056000081A4000003E800000064000000015BE06E03000027F6000000000000000000000000000000000000004A00000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/systems/connectivity.py # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import math import socket from threading import Thread, BoundedSemaphore import urlparse try: import configparser except ImportError: import ConfigParser as configparser import os from collections import namedtuple from swiftlm.utils.utility import ( get_ring_hosts, server_type, UtilityExeception ) from swiftlm.utils.metricdata import MetricData, get_base_dimensions from swiftlm.utils.values import Severity, ServerType from swiftlm.utils.utility import run_cmd # Connectivity needs to report out target hostname and observer hostname # rather than the normal hostname dimension _base_dimensions = dict(get_base_dimensions()) _base_dimensions['observer_host'] = socket.gethostname() BASE_RESULT = MetricData( name=__name__, messages={ 'ok': '{url} ok', 'fail': '{url} {fail_message}' }, dimensions=_base_dimensions ) MAX_THREAD_LIMIT = 10 CONNECT_TIMEOUT = 2.0 JOIN_WAIT = 10.0 SWIFT_PROXY_PATH = '/opt/stack/service/swift-proxy-server/etc' MEMCACHE_CONF_PATH = '/etc/swift' SWIFTLM_SCAN_PATH = '/etc/swiftlm' class HostPort(namedtuple('HostPort', ['host', 'port'])): @classmethod def from_string(cls, s): """ Create a HostPort instance from a string """ # Supports: # http://host.name, http://host.name:port # host.name, host.name:port # 10.10.10.10, 10.10.10.10:9999 s = s.strip() colon_count = s.count(':') if colon_count == 2: return cls(*s.rsplit(':', 1)) elif colon_count == 0: return cls(s, '0') elif colon_count == 1: colon_index = s.find(':') if (len(s) >= colon_index + 2 and s[colon_index + 1] == s[colon_index + 2] == '/'): # We ignore this, it is a URI scheme not a port. # We have to check the length of s first though, if s is # host:1 we could cause an indexerror. return cls(s, '0') else: return cls(*s.rsplit(':', 1)) class CheckThread(Thread): """ Threaded generic check """ def __init__(self, hostport, check_func, thread_limit, result, scheme=None): """ :params hostport: HostPort to be passed to check_func. :params check_func: function that accepts a HostPort, performs a check and returns a bool indicating success or failure. True is success, False is a failure :params thread_limit: BoundedSemaphore for limiting number of active threads. :params result: MetricData that will contain the results of the threads check. :params scheme: The HostPort is checked via http/https """ Thread.__init__(self) self.thread_limit = thread_limit self.check_func = check_func self.hostport = hostport self.result = result self.result.name += '.' + check_func.__name__ if scheme: self.result['url'] = '%s://%s:%s' % (scheme, hostport.host, hostport.port) else: self.result['url'] = '//%s:%s' % (hostport.host, hostport.port) # Ideally, we would indicate here that the hostname dimension # should not be overriden by Monasca-agent, but c'est la vie. self.result['hostname'] = '_' def run(self): self.thread_limit.acquire() check_result = self.check_func(self.hostport) if check_result[0]: self.result.value = Severity.ok else: self.result.msgkey('fail_message', check_result[1]) self.result.value = Severity.fail self.thread_limit.release() def connect_check(hp): try: s = socket.create_connection(hp, CONNECT_TIMEOUT) return (True,) except (socket.error, socket.timeout) as e: return (False, str(e)) finally: try: s.shutdown(socket.SHUT_RDWR) except Exception: pass def memcache_check(hp): try: s = socket.create_connection(hp, CONNECT_TIMEOUT) s.sendall('stats\n') _ = s.recv(1024) return (True,) except (socket.error, socket.timeout) as e: return (False, str(e)) finally: try: s.shutdown(socket.SHUT_RDWR) except Exception: pass # The metric name string is derived from the base metrics name # string plus the name of the function called to collect the # metric, therefore we are using rsync_check() to check # for the status of rsync but using the functionality of # connect_check() def rsync_check(hp): return connect_check(hp) def ping_check(hp): try: cmd_result = run_cmd('ping -c 1 -A %s' % hp.host) if cmd_result.exitcode == 0: return (True,) else: return (False, "ping_check failed") except Exception: return (False, "ping_check failed") def check(targets, check_func, results, scheme=None): threads = [] thread_limit = BoundedSemaphore(value=MAX_THREAD_LIMIT) if not targets: # No hosts to check return for target in targets: t = CheckThread(target, check_func, thread_limit, BASE_RESULT.child(), scheme=scheme) t.start() threads.append(t) # Join wait time logic: worst case is that all threads suffer a timeout. # Since MAX_THREAD_LIMIT threads run in parallel, we may spend # CONNECT_TIMEOUT * (num-threads/MAX_THREAD_LIMIT) seconds in the # socket connect. Add JOIN_WAIT to handle other overhead. wait_time = (JOIN_WAIT + CONNECT_TIMEOUT * math.ceil(float(len(threads)) / float(MAX_THREAD_LIMIT))) for t in threads: t.join(wait_time) if t.isAlive(): # Should not get here, but in case we do t.result.msgkey('fail_message', 'check thread did not complete') t.result.value = Severity.fail results.append(t.result) def main(): """Checks connectivity to memcache and object servers.""" results = [] if server_type(ServerType.proxy): cp = configparser.ConfigParser() cp.read(os.path.join(MEMCACHE_CONF_PATH, 'memcache.conf')) try: memcache_servers = [ HostPort.from_string(s) for s in cp.get('memcache', 'memcache_servers').split(',') ] except configparser.NoSectionError: memcache_servers = [] check(memcache_servers, memcache_check, results) # Check Keystone token-validation endpoint scheme = 'http' cp.read(os.path.join(SWIFT_PROXY_PATH, 'proxy-server.conf')) try: ise = cp.get('filter:authtoken', 'auth_url') parsed = urlparse.urlparse(ise) endpoint_servers = [HostPort(parsed.hostname, str(parsed.port))] scheme = parsed.scheme except configparser.NoSectionError: endpoint_servers = [] check(endpoint_servers, connect_check, results, scheme=scheme) # rsync is required for ACO servers so filter on these server_type() if (server_type(ServerType.account) or server_type(ServerType.container) or server_type(ServerType.object)): # swiftlm-scan.conf is the ansible-generated source of truth # default in the case of ansible not laying down the rsync-target port cp = configparser.ConfigParser() cp.read(os.path.join(SWIFTLM_SCAN_PATH, 'swiftlm-scan.conf')) # this assumes (rightfully so) that all nodes will be using # the same rsync_bind_port as opposed to querying each node # for its possibly uniquely-configured port try: rsync_bind_port = cp.get('rsync-target', 'rsync_bind_port') except (configparser.NoSectionError, configparser.NoOptionError): rsync_bind_port = '873' try: # retrieve unique list of nodes in the ring using the ring file # and utilizing the configured replication network IP rsync_targets = [] devices = get_ring_hosts(ring_type=None) rsync_set = set() for device in devices: if device.replication_ip not in rsync_set: rsync_host = socket.gethostbyaddr(device.replication_ip) rsync_targets.append(HostPort(rsync_host[0], rsync_bind_port)) rsync_set.add(device.replication_ip) except Exception: pass check(rsync_targets, rsync_check, results) # TODO -- rewrite this as a connect_check # try: # ping_targets = [] # devices = get_ring_hosts(ring_type=None) # ip_set = set() # # for device in devices: # if device.ip not in ip_set: # # Port not relevant for ping_check. (Empty string is an # # invalid dimension value, Hence '_' used for target_port) # ping_targets.append(HostPort(device.ip, '_')) # ip_set.add(device.ip) # # except Exception: # noqa # # may be some problem loading ring files, but not concern of this check # # to diagnose any further. # pass # # check(ping_targets, ping_check, results) return results 07070100000057000081A4000003E800000064000000015BE06E0300000E91000000000000000000000000000000000000004100000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/systems/ntp.py # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import re from swiftlm.utils.metricdata import MetricData, CheckFailure from swiftlm.utils.values import Severity from swiftlm.utils.utility import run_cmd BASE_RESULT = MetricData( name=__name__, messages={ 'ok': 'OK', 'fail': 'ntpd not running: {error}', } ) def check_status(): cmd_result = run_cmd('systemctl -q is-active ntpd.service') r = BASE_RESULT.child() if cmd_result.exitcode != 0: r['error'] = cmd_result.output r.value = Severity.fail else: r.value = Severity.ok return [r] def check_details(): """ Parses ntp data in the form: remote refid st t when poll reach delay offset jitter =========================================================================== bindcat.fhsu.ed .INIT. 16 u - 1024 0 0.000 0.000 0.000 origin.towfowi. .INIT. 16 u - 1024 0 0.000 0.000 0.000 time-b.nist.gov .INIT. 16 u - 1024 0 0.000 0.000 0.000 services.quadra .INIT. 16 u - 1024 0 0.000 0.000 0.000 associd=0 status=c011 leap_alarm, sync_unspec, 1 event, freq_not_set, version="ntpd 4.2.6p5@1.2349-o Fri Apr 10 19:04:04 UTC 2015 (1)", processor="x86_64", system="Linux/3.14.44-1-amd64-default", leap=11, stratum=16, precision=-23, rootdelay=0.000, rootdisp=26.340, refid=INIT, reftime=00000000.00000000 Mon, Jan 1 1900 0:00:00.000, clock=d94f932a.13f33874 Tue, Jul 14 2015 13:54:50.077, peer=0, tc=3, mintc=3, offset=0.000, frequency=0.000, sys_jitter=0.000, clk_jitter=0.000, clk_wander=0.000 """ results = [] cmd_result = run_cmd('ntpq -pcrv') if cmd_result.exitcode != 0: failed = CheckFailure.child( dimensions={ 'check': BASE_RESULT.name, 'error': cmd_result.output, } ) failed.value = Severity.fail return [failed] results.append(check_ntpq_fact(cmd_result, 'stratum')) results.append(check_ntpq_fact(cmd_result, 'offset')) return results def check_ntpq_fact(cmd_result, fact_name): fact_result = BASE_RESULT.child(fact_name) # This regex will pick out the value after a fact. e.g # stratum=16, # will match and the value '16' will be stored in 'match.groups()[0]'. # If output does not match the regex 'match' will be None. fact_regex = re.compile(fact_name + '=(.*?),') match = fact_regex.search(cmd_result.output) if match is None: failed = CheckFailure.child( dimensions={ 'check': fact_result.name, 'error': 'Output does not contain "%s"' % fact_name, } ) failed.value = Severity.fail return failed else: fact_level = match.groups()[0] fact_result.value = fact_level return fact_result def main(): """Checks that ntp is running on the server.""" results = [] results.extend(check_status()) results.extend(check_details()) return results 07070100000058000081A4000003E800000064000000015BE06E03000004E5000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/systems/system.py # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import re from swiftlm.utils.metricdata import MetricData, CheckFailure from swiftlm.utils.values import Severity from swiftlm.utils.utility import run_cmd BASE_RESULT = MetricData( name='load.host', messages={} ) def _get_proc_file(path): return open(path, mode='r').read() def get_load_average(): r = BASE_RESULT.child(name='val.five') load_avg_data = _get_proc_file('/proc/loadavg') r.value = float(load_avg_data.split()[1]) return [r] def main(): """ Get system data (such as load average) """ results = [] results.extend(get_load_average()) return results 07070100000059000041ED000003E800000064000000025BE06E0300000000000000000000000000000000000000000000003800000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/utils0707010000005A000081A4000003E800000064000000015BE06E0300000111000000000000000000000000000000000000004400000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/utils/__init__.pySWIFT_PATH = "/etc/swift/" PROXY_PATH = "/opt/stack/service/swift-proxy-server/etc/" ACCOUNT_PATH = "/opt/stack/service/swift-account-server/etc/" CONTAINER_PATH = "/opt/stack/service/swift-container-server/etc/" OBJECT_PATH = "/opt/stack/service/swift-object-server/etc/" 0707010000005B000081ED000003E800000064000000015BE06E0300000F3B000000000000000000000000000000000000004500000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/utils/drivedata.py#!/usr/bin/env python # (c) Copyright 2015 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import yaml DISK_MOUNT = 'disk' LVM_MOUNT = 'lvm' class SwiftlmInvalidConfig(Exception): pass class Drive(object): def __init__(self, device, consumer, swift_name): self.device = device self.consumer = consumer self.swift_device_name = swift_name @classmethod def load(cls, file_name): try: data = yaml.safe_load(open(file_name).read()) except yaml.YAMLError as err: raise SwiftlmInvalidConfig("Unable to read yaml in %s - %s" % (file_name, err)) except IOError as io_err: raise SwiftlmInvalidConfig("Unable to read %s - %s" % (file_name, io_err)) return cls.get_drives(data) @classmethod def get_drives(cls, data): drives = [] device_num = 0 if "device_groups" in data: for device_group in data['device_groups']: for device in device_group['devices']: if device_group.get('consumer'): if device_group['consumer']['name'] == 'swift': drives.append( cls(device=device['name'], consumer=device_group['consumer'], swift_name=DISK_MOUNT + str(device_num))) device_num += 1 return drives def __repr__(self): return 'Drive("{}", {}, "{}")'.format( self.device, self.consumer, self.swift_device_name) class LogicalVol(object): def __init__(self, lvm, consumer, swift_name, lvg): self.lvm = lvm self.consumer = consumer self.swift_lvm_name = swift_name self.lvg = lvg @classmethod def load(cls, file_name): try: data = yaml.safe_load(open(file_name).read()) except yaml.YAMLError as err: raise SwiftlmInvalidConfig("Unable to read yaml in %s - %s" % (file_name, err)) except IOError as io_err: raise SwiftlmInvalidConfig("Unable to read %s - %s" % (file_name, io_err)) return cls.get_lvms(data) @classmethod def get_lvms(cls, data): lvms = [] lvm_num = 0 if "volume_groups" in data: for volume_group in data['volume_groups']: for lv in volume_group['logical_volumes']: lvm_name = lv.get('name') consumer = lv.get('consumer') if consumer and isinstance(consumer, dict): if consumer.get('name', 'not-swift') == 'swift': lvms.append( cls(consumer=consumer, swift_name=LVM_MOUNT + str(lvm_num), lvm=lvm_name, lvg=volume_group['name'])) lvm_num += 1 return lvms def __repr__(self): return 'Logical_Vol("{}", {}, "{}", "{}")'.format( self.lvm, self.consumer, self.swift_lvm_name, self.lvg) 0707010000005C000081ED000003E800000064000000015BE06E0300006466000000000000000000000000000000000000004500000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/utils/jahmonapi.py#!/usr/bin/python # (c) Copyright 2015,2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017-2018 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # import logging from time import sleep, time import datetime import requests import json logger = logging.getLogger('swiftlm-jmoncli') logger.addHandler(logging.NullHandler()) def _import_keystone_client(auth_version): try: if auth_version == 3: from keystoneclient.v3 import client as ksclient else: from keystoneclient.v2_0 import client as ksclient from keystoneclient import exceptions # prevent keystoneclient warning us that it has no log handlers logger = logging.getLogger("swiftclient") # noqa return ksclient, exceptions except ImportError as err: raise Exception('Import error: %s' % err) def get_auth_keystone(auth_url, user, key, os_options, **kwargs): """ Authenticate against a keystone server. We are using the keystoneclient library for authentication. """ insecure = kwargs.get('insecure', False) timeout = kwargs.get('timeout', None) auth_version = 2 if auth_url.endswith('/v3'): auth_version = 3 debug = logger.isEnabledFor(logging.DEBUG) and True or False ksclient, exceptions = _import_keystone_client(auth_version) try: _ksclient = ksclient.Client( username=user, password=key, tenant_name=os_options.get('tenant_name'), tenant_id=os_options.get('tenant_id'), user_id=os_options.get('user_id'), user_domain_name=os_options.get('user_domain_name'), user_domain_id=os_options.get('user_domain_id'), project_name=os_options.get('project_name'), project_id=os_options.get('project_id'), project_domain_name=os_options.get('project_domain_name'), project_domain_id=os_options.get('project_domain_id'), debug=debug, cacert=kwargs.get('cacert'), auth_url=auth_url, insecure=insecure, timeout=timeout) except exceptions.Unauthorized: msg = 'Unauthorized. Check username, password and project name/id.' raise JahmonClientException(msg) except exceptions.AuthorizationFailure as err: raise JahmonClientException('Authorization Failure. %s' % err) service_type = os_options.get('service_type') or 'monitoring' endpoint_type = os_options.get('endpoint_type') or 'internalURL' try: endpoint = _ksclient.service_catalog.url_for( attr='region', filter_value=os_options.get('region_name', None), service_type=service_type, endpoint_type=endpoint_type) except exceptions.EndpointNotFound: raise JahmonClientException('Endpoint not found') return endpoint, _ksclient.auth_token class JahmonClientException(Exception): """ Raised for Monasca API error responses If the error relates to a REST API operation, http_status contains the HTTP error code. Otherwise, the value of http_status is 0. """ def __init__(self, msg, http_status=0): Exception.__init__(self, msg) self.msg = msg self.http_status = http_status def json_response(request): try: return json.loads(request.text) except Exception: raise JahmonClientException('Invalid JSON in response body') def raise_error(request, exc): """ Attempt to pull error from the HTTP response; otherwise throw original """ if isinstance(exc, JahmonClientException): raise exc status_code = None text = '' try: status_code = request.status_code text = request.text except Exception: pass if status_code: raise JahmonClientException('HTTP Code: %s ; Body:%s' % (status_code, text)) else: raise exc class JahmonConnection(object): """ Monasca API interface Provides an interface to the Monasca API. You create a connection object and then use this object to perform operations. Example: conn = JahmonConnection(auth_url='https://key-vip:35357/v2.0', username='myname', password='secret', project_id='123456789123456', region='region1') try: metrics = conn.get_metrics(process.pid_count, dimensions={'process_name': 'monasca-notification'}) for metric in metrics: items = conn.get_measurements(metric, -10, 0) for item in items: print('%s %s: %s' % (item.get('timestamp', item.get('dimensions'), item.get('value')) except JahmonClientException as err: print 'Got %s code; reason: %s' % (err.http_status, err) Exception Handling: If you get errors from the API, the JahmonConnection object raises a JahmonClientException. However, other errors can result in other exceptions. We suggest you catch all exceptions and check for the http_status attribute. If it exists, the error came from the Monasca service itself. Otherwise, the error came from elsewhere. The JahmonConnection object automatically retries operations to handle temporary downtime or glitches in the Identity or Monasca service. There is little point in re-trying an operation yourself within seconds of getting an exception -- wait at least 60 seconds before retrying. There is no need to create a new JahmonConnection object before retrying -- you can use the same JahmonConnection object for the lifetime of the program. You only need a new object if the credentials change. """ def __init__(self, auth_url=None, username=None, password=None, user_id=None, project_id=None, region_name=None, user_domain_id=None, user_domain_name=None, project_name=None, project_domain_name=None, project_domain_id=None, jahmon_url=None, jahmon_token=None, timeout=2.0): """ Create a connection object To perform an operation that requires an authentication token, you must create the connection using the auth_url. If you know the Monasca endpoint and and have an authentication token, specify the jahmon_url and jahmon_token directly. With auth_url, the credentials (username, password, project_id and region_name) are required. The credentials are not used until you perform an operation. If you need to check them, perform an operation (such as get_versions()). Tokens are automatically renewed. """ if not (auth_url or jahmon_url): raise JahmonClientException('Must specify auth_url or jahmon_url') if auth_url and jahmon_url: raise JahmonClientException( 'Cannot specify both auth_url and jahmon_url') if auth_url: if not (project_id or project_name): raise JahmonClientException('Must specify project_id/name' ' and (optionally) region_name') if not ((username or user_id) and password): raise JahmonClientException('Must specify username and ' ' password') if jahmon_token and not jahmon_url: raise JahmonClientException('Must specify jahmon_url with' ' jahmon_token') self.auth_url = auth_url self.username = username self.password = password self.os_options = {} self.os_options['project_id'] = project_id self.os_options['region_name'] = region_name self.os_options['user_id'] = user_id self.os_options['project_name'] = project_name self.os_options['project_domain_name'] = project_domain_name self.os_options['project_domain_id'] = project_domain_id self.os_options['user_domain_name'] = user_domain_name self.os_options['user_domain_id'] = user_domain_id self.retries = 2 self.url, self.token = (jahmon_url, jahmon_token) self.status_code = 0 self.timeout = timeout self.auth_attempts = 0 self.attempts = 0 def _get_auth(self): """ Get endpoint and token values from the identity service """ try: (url, token) = get_auth_keystone(self.auth_url, self.username, self.password, self.os_options) except JahmonClientException as err: raise JahmonClientException('Cannot get token: %s' % err, http_status=401) return (url, token) def _get_url_token(self): """ Get endpoint and token to use with an operation Caches endpoint and token to reduce cost of authenticating. """ if self.url: return (self.url, self.token) self.auth_attempts = 0 while True: self.auth_attempts += 1 try: self.url, self.token = self._get_auth() return (self.url, self.token) except JahmonClientException as err: if self.auth_attempts > self.retries: self.url, self.token = (None, None) raise sleep(1.0) return (self.url, self.token) @classmethod def _timestamp(self): """ Return time rounded to one minute ago in milliseconds """ now = time() return int(now / 60) * 60 * 1000 @classmethod def utctime(cls, dt=None): """ Get ISO 8601 combined date and time format in UTC :param dt: Optional time or offset. If not specified, returns time now. Otherwise, any of the following may be used: - Integer offset. Examples, -1, 0, 2. This is added to the minutes of the current time. - String offset. Examples '-1', '0', '2'. This is added to the minutes of the current time. - A string formatted as %Y/%m/%dT%H:%M:%S.%fZ - A datetime.datetime object """ if isinstance(dt, int) or isinstance(dt, float): dt = str(dt) if not dt: rt = datetime.datetime.utcnow() elif isinstance(dt, str) and ':' not in dt: rt = datetime.datetime.utcnow() rt = datetime.datetime(rt.year, rt.month, rt.day, rt.hour, rt.minute, rt.second, 0) rt += datetime.timedelta(seconds=int(float(dt) * 60)) elif isinstance(dt, str): rt = datetime.datetime.strptime(dt, '%Y-%m-%dT%H:%M:%S.%fZ') else: rt = datetime.datetime(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, 0) return rt.strftime('%Y-%m-%dT%H:%M:%S.%fZ') def get_versions(self): """ GET API versions :returns: object loaded with the JSON response from the API """ self.attempts = 0 while True: self.attempts += 1 try: request = None url, token = self._get_url_token() bits = url.split('/') url = bits[0] + '//' + bits[2] headers = {'x-auth-token': token, 'content-length': str(0), 'accept': 'application/json'} request = requests.get(url, headers=headers, timeout=self.timeout) self.status_code = request.status_code request.raise_for_status() return json_response(request) except Exception as err: if self.attempts > self.retries: raise_error(request, err) self.url, self.token = (None, None) sleep(0.5) def get_metrics(self, metric_name=None, dimensions={}, for_project=None, offset=None): """ Gets metrics The metrics are returned from a generator. Pagination is automatically handled so the metric list is unlimited. Nevertheless, the offset argument may be used. :param metric_name: A metric name. Optional :param dimensions: Dimensions (dictionary) Optional :param for_project: See Monasca API. Optional :param offset: Optional offset. :returns: An iterator. Each item is a dictionary with the following keys: - name - dimensions - id """ response = self.get_metrics_api(metric_name, dimensions=dimensions, for_project=for_project, offset=offset) metrics = self._parse_metrics_response(response) for metric in metrics: yield metric while metrics: if not metrics: break end_metric = metrics[-1] offset = end_metric.get('id') response = self.get_metrics_api(metric_name, dimensions=dimensions, for_project=for_project, offset=offset) metrics = self._parse_metrics_response(response) for metric in metrics: yield metric def _parse_metrics_response(self, response): results = [] elements = response.get('elements') for element in elements: results.append(element) return results def get_metrics_api(self, metric_name=None, dimensions={}, for_project=None, offset=None, limit=None): """ Thin wrapper around the GET /metrics :param metric_name: Metric name. Optional :param dimensions: Dimensions. Optional :param for_project: See Monasca API :param offset: Offset to start from. Optional :param limit: Limits number of metrics returned. :returns: object loaded with the JSON response from the API """ self.attempts = 0 while True: self.attempts += 1 try: request = None url, token = self._get_url_token() url += '/metrics' headers = {'x-auth-token': token, 'content-length': str(0), 'accept': 'application/json'} params = {} if metric_name: params['name'] = metric_name if for_project: params['tenant_id'] = for_project if dimensions: key_vals = [] for key in dimensions.keys(): key_vals.append(key + ':' + dimensions[key]) key_values = ','.join(key_vals) params['dimensions'] = key_values if offset: params['offset'] = offset if limit: params['limit'] = limit request = requests.get(url, params=params, headers=headers, timeout=self.timeout) self.status_code = request.status_code request.raise_for_status() return json_response(request) except Exception as err: if self.attempts > self.retries: raise_error(request, err) self.url, self.token = (None, None) sleep(0.5) def get_measurements(self, metric, start_time, end_time, for_project=None, offset=None, merge_metrics=False, count=None): """ Get measurements for a given metric This function returns a list of measurement objects given an metric object. It handles pagination.The results are returned by a generator so an unlimited number of measurements can be retrieved. The start and end times can be expressed in different ways. See the utctime() description. Unlike the GET /metrics/measurements API, if metrics are merged, each returned item contains the dimensions of the request -- not {}. :param metric: a metric object. This is a dict, with the following keys: - name (metric name) - dimensions (dictionary) :param start_time: Oldest measurement to get :param end_time: Most recent measurement to ge :param for_project: See Monasca API. Optional :param offset: Offset. Optional. :param merge_metrics: If the specified name/dimensions match multiple dimensions, merge the results. :returns: An iterator. Each item is a dictionary containing the following keys: - name (string) - dimensions (dictionary) - value (string) - timestamp (string) - value_meta (dictionary) - id (string) """ start_time = self.utctime(start_time) end_time = self.utctime(end_time) dimensions = metric.get('dimensions') response = self.get_measurements_api(metric.get('name'), start_time, end_time, dimensions=dimensions, for_project=for_project, merge_metrics=merge_metrics, offset=offset) measurements = self._parse_measurements_response(response, dimensions, count) for measurement in measurements: yield measurement while measurements: if not measurements: break end_measurement = measurements[-1] offset = end_measurement.get('timestamp') response = self.get_measurements_api(metric.get('name'), start_time, end_time, dimensions=dimensions, for_project=for_project, merge_metrics=merge_metrics, offset=offset) measurements = self._parse_measurements_response(response, dimensions, count) for measurement in measurements: yield measurement def _parse_measurements_response(self, response, metric_dimensions, count): results = [] elements = response.get('elements') for element in elements: if element.get('dimensions') == {}: dimensions = metric_dimensions else: dimensions = element.get('dimensions') measurements = element.get('measurements') id = element.get('id') name = element.get('name') columns = element.get('columns') col_keys = {} for index, key in enumerate(columns): col_keys[key] = index if count: measurements_of_interest = measurements[-count:] else: measurements_of_interest = measurements for measurement in measurements_of_interest: result = {'dimensions': dimensions, 'value': measurement[col_keys.get('value')], 'timestamp': measurement[col_keys.get('timestamp')], 'value_meta': measurement[col_keys.get( 'value_meta')], 'id': id, 'name': name} results.append(result) return results def get_measurements_api(self, metric_name, start_time, end_time, dimensions={}, for_project=None, merge_metrics=False, offset=None): """ Thin wrapper for GET /metrics/measurements The start and end times can be expressed in different ways. See the utctime() description. :param metric_name: Metric name. Required. :param start_time: Start time :param end_time: End time :param dimensions: Dimensions. Optional :param for_project: See Monasca API :param merge_metrics: See Monasca PI :param offset: Offset. :returns: object loaded with the JSON response from the API """ self.attempts = 0 while True: self.attempts += 1 try: request = None url, token = self._get_url_token() url += '/metrics/measurements' headers = {'x-auth-token': token, 'content-length': str(0), 'accept': 'application/json'} params = {} if metric_name: params['name'] = metric_name if for_project: params['tenant_id'] = for_project if dimensions: key_vals = [] for key in dimensions.keys(): key_vals.append(key + ':' + dimensions[key]) key_values = ','.join(key_vals) params['dimensions'] = key_values params['start_time'] = self.utctime(start_time) params['end_time'] = self.utctime(end_time) params['merge_metrics'] = merge_metrics if offset: params['offset'] = offset request = requests.get(url, params=params, headers=headers, timeout=self.timeout) self.status_code = request.status_code request.raise_for_status() return json_response(request) except Exception as err: if self.attempts > self.retries: raise_error(request, err) self.url, self.token = (None, None) sleep(0.5) def post_metric(self, metric_name, value, dimensions={}, value_meta=None, timestamp=None, for_project=None): """ :param metric_name: The metric name. Required :param value: The value (int, float or string). Required :param dimensions: Dimensions (dictionary). Optional. :param value_meta: Value meta (dictionary). Optional :param timestamp: Timestamp. If omitted, posted as current time., :param for_project: See Monasca API :returns: Nothing. Raises exception if there is a problem """ timestamp = timestamp or self._timestamp() self.attempts = 0 while True: self.attempts += 1 try: request = None url, token = self._get_url_token() url += '/metrics' headers = {} if token: headers['x-auth-token'] = token headers['content-type'] = 'application/json' params = {} if for_project: params['tenant_id'] = for_project payload = {'name': metric_name, 'dimensions': dimensions, 'timestamp': int(timestamp - 1), 'value': float(value)} if value_meta: payload['value_meta'] = value_meta request = requests.post(url, params=params, headers=headers, timeout=self.timeout, data=json.dumps(payload)) self.status_code = request.status_code request.raise_for_status() return except Exception as err: if self.attempts > self.retries: raise_error(request, err) self.url, self.token = (None, None) sleep(0.5) 0707010000005D000081A4000003E800000064000000015BE06E03000029C3000000000000000000000000000000000000004600000000python-swiftlm-8.0+git.1541434883.e0ebe69/swiftlm/utils/log_tailer.py # (c) Copyright 2016 Hewlett Packard Enterprise Development LP # (c) Copyright 2017 SUSE LLC # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the License. You may obtain # a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. # from collections import defaultdict import os from stat import ST_INO class LogTailer(object): """ Read lines off tail end of a log file This class allows you to read lines at the end of a log file. If the file is closed and rotated, we reopen the file. """ def __init__(self, log_file_name): self.log_file_name = log_file_name self.log_fd = open(log_file_name, 'r') self.log_fd.seek(0, os.SEEK_END) def lines(self): ''' Read lines from end of file This generator returns lines that have been writen to the log file since the last time lines() was called. ''' while True: line = self.log_fd.readline() if not line: # Get i-node number of the file descriptor and the # file on disk. stat_log_file = os.stat(self.log_file_name) stat_file_object = os.fstat(self.log_fd.fileno()) if stat_log_file.st_ino != stat_file_object[ST_INO]: # The i-node changed, so reopen log file self.log_fd.close() self.log_fd = open(self.log_file_name) line = self.log_fd.readline() if not line: break yield line # # Operation counting classes # # These classes are used to record stats for each access made. They count # number of ops, bytes_put and bytes_get as a total, per-project and # per-container (in a project) # # Notes: # - Any access counts as an operation (no matter verb/status) # - Only successful puts/gets are counted in the put/get bytes # - Only put/gets to an object are counted in put/get bytes # - PUT or POST is counted as a put bytes (COPY is server-side) # class HttpStatus(object): def __init__(self, http_status): self.status = int(http_status) def is_success(self): if self.status >= 200 and self.status < 300: return True return False class OpsRecorder(object): def __init__(self): self.name = '' self.ops = 0 self.bytes_put = 0 self.bytes_get = 0 def record_op(self, name, verb, http_status, bytes_transferred, obj): self.name = name if HttpStatus(http_status).is_success() and obj: if verb.lower() in ['put', 'post']: self.bytes_put += bytes_transferred elif verb.lower() in ['get']: self.bytes_get += bytes_transferred else: pass self.ops += 1 def get_stats(self): return {'name': self.name, 'ops': self.ops, 'bytes_put': self.bytes_put, 'bytes_get': self.bytes_get} def __repr__(self): return '%s' % self.get_stats() class ProjectRecorder(object): def __init__(self): self.ops = OpsRecorder() self.containers = defaultdict(OpsRecorder) def record_op(self, project, verb, http_status, bytes_transferred, container=None, obj=None): self.ops.record_op(project, verb, http_status, bytes_transferred, obj) if container: self.containers[container].record_op(container, verb, http_status, bytes_transferred, obj) def get_stats(self): return self.ops.get_stats() def get_containers(self): for key in self.containers.keys(): yield self.containers.get(key) def __repr__(self): repr = '' repr += ' stats: %s\n' % self.ops for container in self.get_containers(): repr += ' container: %s' % container return repr class AccessStatsRecorder(object): """ Record stats for operations """ def __init__(self): self.ops = OpsRecorder() self.projects = defaultdict(ProjectRecorder) def record_op(self, verb, http_status, bytes_transferred, project=None, container=None, obj=None): self.ops.record_op('total', verb, http_status, bytes_transferred, obj) if project: self.projects[project].record_op(project, verb, http_status, bytes_transferred, container, obj) def get_stats(self): return self.ops.get_stats() def get_projects(self): for key in self.projects.keys(): yield self.projects.get(key) def __repr__(self): repr = '' repr += 'stats: %s\n' % self.ops for project in self.get_projects(): repr += ' project:\n %s\n' % project return repr def split_path(path): """ Splits path into component parts :param path: the path (e.g., /v1/AUTH_1234/c/o/sub) :return: a tuple containing account, container, obj """ account = container = obj = None try: _, _, account = path.split('/', 2) _, _, account, container = path.split('/', 3) if not container: container = None _, _, account, container, obj = path.split('/', 4) if not obj: obj = None except ValueError: pass return account, container, obj def parse_proxy_log_message(line, reseller_prefixes): """ Extract proxy access record from a Swift log This function parses a Swift log looking for messages written by the proxy-logging middleware. Matching records result in a dict cotaining key information about the transaction. Otherwise a string type is returned for any ofthe following: - Not written by proxy-logging middleware - Where swift.source is not '-' (i.e., an internal request) - Where the account starts with '.' (e.g., reconciler account) :param line: a single line of text, describes a single transaction. A valid line from the proxy-logging middleware looks like the following: : HTTP/1.0 - - -